[jira] [Commented] (PDFBOX-2252) PDFTextStripper has problem with documents with mixed language directions

2015-07-21 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636008#comment-14636008
 ] 

John Hewson commented on PDFBOX-2252:
-

{quote}
We can't rely on the person who creates a pdf or on the integrity of the 
software that converts a text to pdf's.
{quote}

I wholeheartedly agree with that statement.

> PDFTextStripper has problem with documents with mixed language directions
> -
>
> Key: PDFBOX-2252
> URL: https://issues.apache.org/jira/browse/PDFBOX-2252
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.8.6, 2.0.0
>Reporter: Amir
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: PDFTextStripper.java.patch, atest.pdf, overlap.jpg, 
> test.pdf, wikipedia_dl_lyric_test.pdf
>
>
> When the input document of PDFTextStripper is a combination of right-to-left 
> and left-to-right languages, the output characters of one language is 
> reversed. 
> A sample bilingual pdf document is attached.
> PDFTextStripper has a variable "isRtlDominant" in "writePage" function, which 
> is defined as follows: boolean isRtlDominant = rtlCount > ltrCount;
> This class clearly count the number of rtl characters and decide if the whole 
> content should be revered or not. It's not true, it must operate on each 
> word, not the whole document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2897) Preflight not flagging bad xml generated by XMPBox for dc:title

2015-07-21 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635664#comment-14635664
 ] 

Maruan Sahyoun commented on PDFBOX-2897:


Just to be sure I also verified with Acrobat X and it also complains.

> Preflight not flagging bad xml generated by XMPBox for dc:title
> ---
>
> Key: PDFBOX-2897
> URL: https://issues.apache.org/jira/browse/PDFBOX-2897
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight, XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
> Attachments: PDFBOX-2897-PDFA-BadXMP.pdf
>
>
> [~tilman] asked that I open two separate issues for the finding in TIKA-1678 
> that XMPBox is not generating a valid dc:title entry in the XMP.  This issue 
> is meant to track preflight's failure to detect this problem.
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-2897) Preflight not flagging bad xml generated by XMPBox for dc:title

2015-07-21 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635664#comment-14635664
 ] 

Maruan Sahyoun edited comment on PDFBOX-2897 at 7/21/15 7:23 PM:
-

Just to be sure I also verified the file with Acrobat X and it also complains.


was (Author: msahyoun):
Just to be sure I also verified with Acrobat X and it also complains.

> Preflight not flagging bad xml generated by XMPBox for dc:title
> ---
>
> Key: PDFBOX-2897
> URL: https://issues.apache.org/jira/browse/PDFBOX-2897
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight, XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
> Attachments: PDFBOX-2897-PDFA-BadXMP.pdf
>
>
> [~tilman] asked that I open two separate issues for the finding in TIKA-1678 
> that XMPBox is not generating a valid dc:title entry in the XMP.  This issue 
> is meant to track preflight's failure to detect this problem.
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635659#comment-14635659
 ] 

Maruan Sahyoun commented on PDFBOX-2896:


Acrobat X thinks it's fine - will test with other versions later.

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635592#comment-14635592
 ] 

Tilman Hausherr commented on PDFBOX-2896:
-

Can you ask Dr. B.? And what about your own pdfa check software from acrobat / 
callas? There's also validatepdfa.com, but they always answer that they 
couldn't check my document. Maybe I'm blocked, or maybe their service is broken.

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635464#comment-14635464
 ] 

Maruan Sahyoun commented on PDFBOX-2896:


mhmm - 
http://www.pdflib.com/de/knowledge-base/xmp-metadaten/kostenloser-xmp-validator/
 thinks it's OK

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635455#comment-14635455
 ] 

Tilman Hausherr commented on PDFBOX-2896:
-

The Bavaria tests run fine. However PDF-Tools tells this:
{quote}
There is only one RDF resource allowed in XMP.
The document does not conform to the requested standard.
{quote}


> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [RESULT][VOTE] Release Apache PDFBox 1.8.10

2015-07-21 Thread Tilman Hausherr

Am 21.07.2015 um 18:56 schrieb Andreas Lehmkuehler:

Hi,

Am 18.07.2015 um 18:16 schrieb Andreas Lehmkuehler:

Please vote on releasing this package as Apache PDFBox 1.8.10.


  +1 Rey Malahay (*)
  +1 Tilman Hausherr
  +1 Maruan Sahyoun
  +1 Andreas Lehmkühler


Wheew! That was close.

Tilman

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635398#comment-14635398
 ] 

Maruan Sahyoun edited comment on PDFBOX-2896 at 7/21/15 5:02 PM:
-

A general note. I've put the major change into the XmpSerializer. Before 
serializing the attributes have the namespace and namespace prefix of the 
scheme they are in. That's also true for other 'base' types such as Alt, Seq 
and Bag where already during serializing the RDF namespaces were set regardless 
of the elements original namespace. 

[~tilman] could you run the PDF/A test to see if the changes were causing any 
regressions? They shouldn't as I tried to stay away from the internals (see 
above comment) so if e.g. an XMP item has a wrong prefix or namespace that 
still exists prior to serializing.


was (Author: msahyoun):
A general note. I've put the major change into the XmpSerializer. Before 
serializing the attributes have the namespace and namespace prefix of the 
scheme they are in. That's also true for other 'base' types such as Alt, Seq 
and Bag where already during serializing the RDF namespaces were set regardless 
of the elements original namespace. 

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635398#comment-14635398
 ] 

Maruan Sahyoun commented on PDFBOX-2896:


A general note. I've put the major change into the XmpSerializer. Before 
serializing the attributes have the namespace and namespace prefix of the 
scheme they are in. That's also true for other 'base' types such as Alt, Seq 
and Bag where already during serializing the RDF namespaces were set regardless 
of the elements original namespace. 

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[RESULT][VOTE] Release Apache PDFBox 1.8.10

2015-07-21 Thread Andreas Lehmkuehler

Hi,

Am 18.07.2015 um 18:16 schrieb Andreas Lehmkuehler:

Please vote on releasing this package as Apache PDFBox 1.8.10.


  +1 Rey Malahay (*)
  +1 Tilman Hausherr
  +1 Maruan Sahyoun
  +1 Andreas Lehmkühler

Thanks for your help and support!! I'll push the release out.

BR
Andreas Lehmkühler

(*) non-binding vote

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635394#comment-14635394
 ] 

Maruan Sahyoun commented on PDFBOX-2896:


with the latest commit the result for the above code is now

{code}

  
this is the title
  

{code}

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635391#comment-14635391
 ] 

ASF subversion and git services commented on PDFBOX-2896:
-

Commit 1692171 from [~msahyoun] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1692171 ]

PDFBOX-2896: set ArrayType contents prefix to rdf

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635348#comment-14635348
 ] 

ASF subversion and git services commented on PDFBOX-2896:
-

Commit 1692165 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1692165 ]

PDFBOX-2896: check XMP title in test

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635345#comment-14635345
 ] 

ASF subversion and git services commented on PDFBOX-2896:
-

Commit 1692164 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1692164 ]

PDFBOX-2896: add title to example

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2898) Incorrect key for color space in PDGroup

2015-07-21 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2898:

Labels: Transparency  (was: )

> Incorrect key for color space in PDGroup
> 
>
> Key: PDFBOX-2898
> URL: https://issues.apache.org/jira/browse/PDFBOX-2898
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Windows 7, 64. Java 1.7.0_51
>Reporter: Evgeniy Muravitskiy
>Assignee: Tilman Hausherr
>  Labels: Transparency
> Fix For: 2.0.0
>
> Attachments: InteractiveObjects.pdf
>
>
> According to the specification of PDF Reference 1.4 (1.7), paragraph 7.5.5, 
> Transparency Group may contain Color Space in the 'CS' key, but in the method 
> PDGroup.getColorSpace () the search is implemented with the use of 
> 'ColorSpace' key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-2898) Incorrect key for color space in PDGroup

2015-07-21 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2898.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

Setting to resolved. Thanks!

> Incorrect key for color space in PDGroup
> 
>
> Key: PDFBOX-2898
> URL: https://issues.apache.org/jira/browse/PDFBOX-2898
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Windows 7, 64. Java 1.7.0_51
>Reporter: Evgeniy Muravitskiy
>Assignee: Tilman Hausherr
>  Labels: Transparency
> Fix For: 2.0.0
>
> Attachments: InteractiveObjects.pdf
>
>
> According to the specification of PDF Reference 1.4 (1.7), paragraph 7.5.5, 
> Transparency Group may contain Color Space in the 'CS' key, but in the method 
> PDGroup.getColorSpace () the search is implemented with the use of 
> 'ColorSpace' key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635341#comment-14635341
 ] 

ASF subversion and git services commented on PDFBOX-2896:
-

Commit 1692163 from [~msahyoun] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1692163 ]

PDFBOX-2896: ensure that language attribute is set

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-2898) Incorrect key for color space in PDGroup

2015-07-21 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635334#comment-14635334
 ] 

Tilman Hausherr edited comment on PDFBOX-2898 at 7/21/15 4:24 PM:
--

Sadly, the file doesn't have a difference when rendering. Same when changing to 
CMYK and viewing it with Adobe Reader. My tests with hundreds of files show no 
difference, but the specification is clear, so I'll commit it.


was (Author: tilman):
Sadly, the file doesn't have a difference when rendering. Same when changing to 
CMYK and viewing it with Adobe Reader. My tests with hundreds of files show no 
difference, the specification is clear, so I'll commit it.

> Incorrect key for color space in PDGroup
> 
>
> Key: PDFBOX-2898
> URL: https://issues.apache.org/jira/browse/PDFBOX-2898
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Windows 7, 64. Java 1.7.0_51
>Reporter: Evgeniy Muravitskiy
>Assignee: Tilman Hausherr
> Attachments: InteractiveObjects.pdf
>
>
> According to the specification of PDF Reference 1.4 (1.7), paragraph 7.5.5, 
> Transparency Group may contain Color Space in the 'CS' key, but in the method 
> PDGroup.getColorSpace () the search is implemented with the use of 
> 'ColorSpace' key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2898) Incorrect key for color space in PDGroup

2015-07-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635335#comment-14635335
 ] 

ASF subversion and git services commented on PDFBOX-2898:
-

Commit 1692161 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1692161 ]

PDFBOX-2898: use correct key for colorspace, as suggested by Evgeniy Muravitskiy

> Incorrect key for color space in PDGroup
> 
>
> Key: PDFBOX-2898
> URL: https://issues.apache.org/jira/browse/PDFBOX-2898
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Windows 7, 64. Java 1.7.0_51
>Reporter: Evgeniy Muravitskiy
>Assignee: Tilman Hausherr
> Attachments: InteractiveObjects.pdf
>
>
> According to the specification of PDF Reference 1.4 (1.7), paragraph 7.5.5, 
> Transparency Group may contain Color Space in the 'CS' key, but in the method 
> PDGroup.getColorSpace () the search is implemented with the use of 
> 'ColorSpace' key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-2898) Incorrect key for color space in PDGroup

2015-07-21 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-2898:
---

Assignee: Tilman Hausherr

> Incorrect key for color space in PDGroup
> 
>
> Key: PDFBOX-2898
> URL: https://issues.apache.org/jira/browse/PDFBOX-2898
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Windows 7, 64. Java 1.7.0_51
>Reporter: Evgeniy Muravitskiy
>Assignee: Tilman Hausherr
> Attachments: InteractiveObjects.pdf
>
>
> According to the specification of PDF Reference 1.4 (1.7), paragraph 7.5.5, 
> Transparency Group may contain Color Space in the 'CS' key, but in the method 
> PDGroup.getColorSpace () the search is implemented with the use of 
> 'ColorSpace' key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2898) Incorrect key for color space in PDGroup

2015-07-21 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635334#comment-14635334
 ] 

Tilman Hausherr commented on PDFBOX-2898:
-

Sadly, the file doesn't have a difference when rendering. Same when changing to 
CMYK and viewing it with Adobe Reader. My tests with hundreds of files show no 
difference, the specification is clear, so I'll commit it.

> Incorrect key for color space in PDGroup
> 
>
> Key: PDFBOX-2898
> URL: https://issues.apache.org/jira/browse/PDFBOX-2898
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Windows 7, 64. Java 1.7.0_51
>Reporter: Evgeniy Muravitskiy
> Attachments: InteractiveObjects.pdf
>
>
> According to the specification of PDF Reference 1.4 (1.7), paragraph 7.5.5, 
> Transparency Group may contain Color Space in the 'CS' key, but in the method 
> PDGroup.getColorSpace () the search is implemented with the use of 
> 'ColorSpace' key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2852) Improve code quality (2)

2015-07-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635331#comment-14635331
 ] 

ASF subversion and git services commented on PDFBOX-2852:
-

Commit 1692160 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1692160 ]

PDFBOX-2852: remove unused imports

> Improve code quality (2)
> 
>
> Key: PDFBOX-2852
> URL: https://issues.apache.org/jira/browse/PDFBOX-2852
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2576, which was getting too long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-1350) OutOfMemory on reading FlateFilter images in PDF

2015-07-21 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635327#comment-14635327
 ] 

Tilman Hausherr commented on PDFBOX-1350:
-

The missing numbers will be fixed in PDFBOX-2899. (2.0 only)

> OutOfMemory on reading FlateFilter images in PDF
> 
>
> Key: PDFBOX-1350
> URL: https://issues.apache.org/jira/browse/PDFBOX-1350
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Rendering
>Affects Versions: 1.7.0, 2.0.0
>Reporter: philip huang
> Attachments: 40376536.PDF
>
>
> I had set JAVA VM parameter -Xmx1048m
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
>   at 
> org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:131)
>   at 
> org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
>   at java.io.OutputStream.write(OutputStream.java:58)
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:129)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
>   at 
> org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:214)
>   at 
> org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:468)
>   at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:183)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2899) Text not rendered in mode 7 (2)

2015-07-21 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2899:

Attachment: PDFBOX-1350-reduced.pdf

> Text not rendered in mode 7 (2)
> ---
>
> Key: PDFBOX-2899
> URL: https://issues.apache.org/jira/browse/PDFBOX-2899
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
> Attachments: PDFBOX-1350-reduced.pdf
>
>
> The attached file is a reduced version of the file from PDFBOX-1350. It 
> should show "T U S" at the bottom left but it doesn't. I believe that the 
> cause is very similar to PDFBOX-2814 (text rendering in mode 7), except that 
> this time, the text is not splitted within one "TJ" segment, but across 
> several "Tj" segments.
> {code}
> BT
> 7 Tr
> /F1 1.0 Tf
> 0.01063 0.0 0.0 0.00573 0.05521 0.16300 Tm
> 0 -1.01768 Td
> (\000\065) Tj
> /F1 1.0 Tf
> 0.01063 0.0 0.0 0.00573 0.05521 0.16300 Tm
> 1.38462 -1.01768 Td
> (\000\066) Tj
> /F1 1.0 Tf
> 0.01063 0.0 0.0 0.00573 0.05521 0.16300 Tm
> 1.69789 -1.01768 Td
> /F1 1.0 Tf
> 0.01063 0.0 0.0 0.00573 0.05521 0.16300 Tm
> 2.01116 -1.01768 Td
> /F1 1.0 Tf
> 0.01063 0.0 0.0 0.00573 0.05521 0.16300 Tm
> 2.32442 -1.01768 Td
> (\000\064) Tj
> ET
>   
> /Im1 Do
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-2899) Text not rendered in mode 7 (2)

2015-07-21 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-2899:
---

 Summary: Text not rendered in mode 7 (2)
 Key: PDFBOX-2899
 URL: https://issues.apache.org/jira/browse/PDFBOX-2899
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr


The attached file is a reduced version of the file from PDFBOX-1350. It should 
show "T U S" at the bottom left but it doesn't. I believe that the cause is 
very similar to PDFBOX-2814 (text rendering in mode 7), except that this time, 
the text is not splitted within one "TJ" segment, but across several "Tj" 
segments.

{code}
BT
7 Tr
/F1 1.0 Tf
0.01063 0.0 0.0 0.00573 0.05521 0.16300 Tm
0 -1.01768 Td
(\000\065) Tj
/F1 1.0 Tf
0.01063 0.0 0.0 0.00573 0.05521 0.16300 Tm
1.38462 -1.01768 Td
(\000\066) Tj
/F1 1.0 Tf
0.01063 0.0 0.0 0.00573 0.05521 0.16300 Tm
1.69789 -1.01768 Td
/F1 1.0 Tf
0.01063 0.0 0.0 0.00573 0.05521 0.16300 Tm
2.01116 -1.01768 Td
/F1 1.0 Tf
0.01063 0.0 0.0 0.00573 0.05521 0.16300 Tm
2.32442 -1.01768 Td
(\000\064) Tj
ET  

/Im1 Do
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2814) Text not rendered in mode 7

2015-07-21 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2814:

Summary: Text not rendered in mode 7  (was: Chars are not rendered )

> Text not rendered in mode 7
> ---
>
> Key: PDFBOX-2814
> URL: https://issues.apache.org/jira/browse/PDFBOX-2814
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
> Environment: Oracle Java8 - Windows 8.1 x64
>Reporter: Daniele guiducci
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: PDFBOX-2814.pdf, PDFBOX-2814.png
>
>
> texts in PDF are not rendered when is present shadow. 
> The url contains 4 pdf of the same page, exported in 4 different ways. In few 
> cases one the last text line are not rendered.
> https://drive.google.com/drive/u/0/folders/0Bz-TZKnUq0uufmVLUDliMFBGSGRWZU1sQWVJamQ5VV9mOGxaMXhTOXZuSVhQd3Vpc280Z2M



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2898) Incorrect key for color space in PDGroup

2015-07-21 Thread Evgeniy Muravitskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Evgeniy Muravitskiy updated PDFBOX-2898:

Attachment: InteractiveObjects.pdf

Group object at the first page, in page dictionary

> Incorrect key for color space in PDGroup
> 
>
> Key: PDFBOX-2898
> URL: https://issues.apache.org/jira/browse/PDFBOX-2898
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Windows 7, 64. Java 1.7.0_51
>Reporter: Evgeniy Muravitskiy
> Attachments: InteractiveObjects.pdf
>
>
> According to the specification of PDF Reference 1.4 (1.7), paragraph 7.5.5, 
> Transparency Group may contain Color Space in the 'CS' key, but in the method 
> PDGroup.getColorSpace () the search is implemented with the use of 
> 'ColorSpace' key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635098#comment-14635098
 ] 

Maruan Sahyoun commented on PDFBOX-2896:


The commit https://svn.apache.org/r1692060 enabled that if a language is 
specified the attribute is written out, which hasn't been the case. 

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2898) Incorrect key for color space in PDGroup

2015-07-21 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635089#comment-14635089
 ] 

Maruan Sahyoun commented on PDFBOX-2898:


I think you are right - would you be able to attach a sample document to test 
with?

[~tilman] [~jahewson] I could change it quickly but I'm fearing that it might 
have implications and I don't have a set of test files for that feature.

>From the ISO 32000-1 spec // Table 147 - Additional entries specific to a 
>transparency group attributes dictionary (continued)
{quote}
Key: CS
Type: name or array
Value: (Sometimes required) The group colour space, which is used for the 
following purposes:
...
{quote} 

The other option would be to use both keys 
{{getDictionaryObject(COSName.COLORSPACE, COSName.CS);}} incorrect from a spec 
point of view but retains the old behavior

> Incorrect key for color space in PDGroup
> 
>
> Key: PDFBOX-2898
> URL: https://issues.apache.org/jira/browse/PDFBOX-2898
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Windows 7, 64. Java 1.7.0_51
>Reporter: Evgeniy Muravitskiy
>
> According to the specification of PDF Reference 1.4 (1.7), paragraph 7.5.5, 
> Transparency Group may contain Color Space in the 'CS' key, but in the method 
> PDGroup.getColorSpace () the search is implemented with the use of 
> 'ColorSpace' key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-2898) Incorrect key for color space in PDGroup

2015-07-21 Thread Evgeniy Muravitskiy (JIRA)
Evgeniy Muravitskiy created PDFBOX-2898:
---

 Summary: Incorrect key for color space in PDGroup
 Key: PDFBOX-2898
 URL: https://issues.apache.org/jira/browse/PDFBOX-2898
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
 Environment: Windows 7, 64. Java 1.7.0_51
Reporter: Evgeniy Muravitskiy


According to the specification of PDF Reference 1.4 (1.7), paragraph 7.5.5, 
Transparency Group may contain Color Space in the 'CS' key, but in the method 
PDGroup.getColorSpace () the search is implemented with the use of 'ColorSpace' 
key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-2252) PDFTextStripper has problem with documents with mixed language directions

2015-07-21 Thread Andreas Meier (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634577#comment-14634577
 ] 

Andreas Meier edited comment on PDFBOX-2252 at 7/21/15 9:43 AM:


Yes, numbers are written ltr

Everything happening in this case depends on the strong LTR and RTL characters.

For example:
The (( occurs, because Adobe Reader notices the strong RTL characters and 
extracts them. The strong RTL characters will then turn the direction of the 
trailing ")" and write it to the left of the RTL-word.

A simple approach without markers (which relies on the direction of strong 
RTL/LTR characters) will not handle this problem.

That's the reason why I suggested the multi-stage approach:

We can't rely on the person who creates a pdf or on the integrity of the 
software that converts a text pdf's.


was (Author: andreasmeier):
Yes, numbers are written ltr

Everything happening in this case depends on the strong LTR and RTL characters.

For example:
The (( occurs, because Adobe Reader notices the strong RTL characters and 
writes the next character ")" to the left.

> PDFTextStripper has problem with documents with mixed language directions
> -
>
> Key: PDFBOX-2252
> URL: https://issues.apache.org/jira/browse/PDFBOX-2252
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.8.6, 2.0.0
>Reporter: Amir
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: PDFTextStripper.java.patch, atest.pdf, overlap.jpg, 
> test.pdf, wikipedia_dl_lyric_test.pdf
>
>
> When the input document of PDFTextStripper is a combination of right-to-left 
> and left-to-right languages, the output characters of one language is 
> reversed. 
> A sample bilingual pdf document is attached.
> PDFTextStripper has a variable "isRtlDominant" in "writePage" function, which 
> is defined as follows: boolean isRtlDominant = rtlCount > ltrCount;
> This class clearly count the number of rtl characters and decide if the whole 
> content should be revered or not. It's not true, it must operate on each 
> word, not the whole document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-2252) PDFTextStripper has problem with documents with mixed language directions

2015-07-21 Thread Andreas Meier (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634577#comment-14634577
 ] 

Andreas Meier edited comment on PDFBOX-2252 at 7/21/15 9:43 AM:


Yes, numbers are written ltr

Everything happening in this case depends on the strong LTR and RTL characters.

For example:
The (( occurs, because Adobe Reader notices the strong RTL characters and 
extracts them. The strong RTL characters will then turn the direction of the 
trailing ")" and write it to the left of the RTL-word.

A simple approach without markers (which relies on the direction of strong 
RTL/LTR characters) will not handle this problem.

That's the reason why I suggested the multi-stage approach:

We can't rely on the person who creates a pdf or on the integrity of the 
software that converts a text to pdf's.


was (Author: andreasmeier):
Yes, numbers are written ltr

Everything happening in this case depends on the strong LTR and RTL characters.

For example:
The (( occurs, because Adobe Reader notices the strong RTL characters and 
extracts them. The strong RTL characters will then turn the direction of the 
trailing ")" and write it to the left of the RTL-word.

A simple approach without markers (which relies on the direction of strong 
RTL/LTR characters) will not handle this problem.

That's the reason why I suggested the multi-stage approach:

We can't rely on the person who creates a pdf or on the integrity of the 
software that converts a text pdf's.

> PDFTextStripper has problem with documents with mixed language directions
> -
>
> Key: PDFBOX-2252
> URL: https://issues.apache.org/jira/browse/PDFBOX-2252
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.8.6, 2.0.0
>Reporter: Amir
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: PDFTextStripper.java.patch, atest.pdf, overlap.jpg, 
> test.pdf, wikipedia_dl_lyric_test.pdf
>
>
> When the input document of PDFTextStripper is a combination of right-to-left 
> and left-to-right languages, the output characters of one language is 
> reversed. 
> A sample bilingual pdf document is attached.
> PDFTextStripper has a variable "isRtlDominant" in "writePage" function, which 
> is defined as follows: boolean isRtlDominant = rtlCount > ltrCount;
> This class clearly count the number of rtl characters and decide if the whole 
> content should be revered or not. It's not true, it must operate on each 
> word, not the whole document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634745#comment-14634745
 ] 

Hudson commented on PDFBOX-2896:


SUCCESS: Integrated in tika-trunk-jdk1.7 #796 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/796/])
TIKA-1678 -- initial commit.  Need to wait for fix to PDFBOX-2896 to generate 
test file. (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1692042)
* /tika/trunk/tika-parsers/src/main/java/org/apache/pdfbox
* /tika/trunk/tika-parsers/src/main/java/org/apache/pdfbox/pdfparser
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/pdfbox/pdfparser/PDFOctalUnicodeDecoder.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java


> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634730#comment-14634730
 ] 

ASF subversion and git services commented on PDFBOX-2896:
-

Commit 1692060 from [~msahyoun] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1692060 ]

PDFBOX-2896: serialize attributes for simple properties

> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org