[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid "title" entry in DublinCoreSchema in trunk

2015-07-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634745#comment-14634745
 ] 

Hudson commented on PDFBOX-2896:


SUCCESS: Integrated in tika-trunk-jdk1.7 #796 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/796/])
TIKA-1678 -- initial commit.  Need to wait for fix to PDFBOX-2896 to generate 
test file. (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1692042)
* /tika/trunk/tika-parsers/src/main/java/org/apache/pdfbox
* /tika/trunk/tika-parsers/src/main/java/org/apache/pdfbox/pdfparser
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/pdfbox/pdfparser/PDFOctalUnicodeDecoder.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java


> XMPBox not creating valid "title" entry in DublinCoreSchema in trunk
> 
>
> Key: PDFBOX-2896
> URL: https://issues.apache.org/jira/browse/PDFBOX-2896
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.0
>Reporter: Tim Allison
>Priority: Minor
>
> On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
> XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
> adding this:
> {code}
> DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
> dc.setTitle("this is the title");
> {code}
> The generated PDF doesn't appear to have a compliant dc:title entry in the 
> XMP.  
> [~tilman] noted the divergence from the standard 
> [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
> What PDFBox does:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> It should be:
> {code}
>   
> 
>   this is the title
> 
>   
> {code}
> Error message from the PDF-Tools validator:
> {quote}
> 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
> where N is a positive number.
> There is only one RDF resource allowed in XMP.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-1130) ExtractText -html doesn't always close the tags it opens

2015-03-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343741#comment-14343741
 ] 

Hudson commented on PDFBOX-1130:


SUCCESS: Integrated in tika-trunk-jdk1.7 #524 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/524/])
TIKA-758 clean up after remembering PDFBOX-1130 (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1663424)
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java


> ExtractText -html doesn't always close the  tags it opens
> 
>
> Key: PDFBOX-1130
> URL: https://issues.apache.org/jira/browse/PDFBOX-1130
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 1.8.0
>
> Attachments: 86.pdf, PDFBOX-1130.patch
>
>
> I have a test document (same one on PDFBOX-1129), which when run through 
> ExtractText -html, extracts the page number for each page, however in each 
> case the page number looks like:
> NText of page N...
> Ie, the  tag for the page number wasn't closed.
> Maybe related: if I run ExtractText without html, there is not space after 
> the page number and before the next word, ie I see words like 1Massachusetts, 
> 2Course, 3also, 4the.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2383) PDFBox tests include copyright files

2015-02-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309876#comment-14309876
 ] 

Hudson commented on PDFBOX-2383:


SUCCESS: Integrated in tika-trunk-jdk1.7 #474 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/474/])
TIKA-1542 substitute Apache friendly TTF test file for our current copyrighted 
file, take 2.  See PDFBOX-2383 (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1657952)
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/font/FontParsersTest.java
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testTrueType2.ttf
* /tika/trunk/tika-parsers/src/test/resources/test-documents/testTrueType3.ttf


> PDFBox tests include copyright files
> 
>
> Key: PDFBOX-2383
> URL: https://issues.apache.org/jira/browse/PDFBOX-2383
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.7, 2.0.0
>Reporter: John Hewson
>Assignee: Tilman Hausherr
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: Aclonica.ttf
>
>
> The test files for PDFBox, FontBox, and Preflight include several files under 
> copyright which we probably don't have permission to redistribute, and need 
> to be removed (or preferably replaced):
> pdfbox/src/test/resources/org/apache/pdfbox/
>   - -ttf/ArialMT.ttf (This is actually Bitstream Vera Sans - the license on 
> this might be ok though?)-
>   - -pdfparser/gdb-refcard.pdf (GPL licensed)-
>   - -pdmodel/page_label.pdf (Edited by Foxit PDF for Evaluation Only)-
>   - -pdmodel/font/256.pdf (Copyright 2004 Journal of Combinatorics)-
> fontbox/src/test/resources/ttf/
> - -testTrueType.ttf (NewBaskerville, Copyright © 2002 Veronika Elsner)-
> preflight/src/test/resources/org/apache/padaf/preflight/font/
> - -true_type.ttf (Subset of Microsoft Arial)-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2122) FontBox's TTFDataStream doesn't set timezone in readInternationalDate

2014-06-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025463#comment-14025463
 ] 

Hudson commented on PDFBOX-2122:


SUCCESS: Integrated in tika-trunk-jdk1.6 #36 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/36/])
TIKA-1325: small workaround until we can integrate PDFBOX-2122. Default 
timezone is now set and then unset for ttf test in FontParsers test. (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1601444)
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/font/FontParsersTest.java


> FontBox's TTFDataStream doesn't set timezone in readInternationalDate
> -
>
> Key: PDFBOX-2122
> URL: https://issues.apache.org/jira/browse/PDFBOX-2122
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 1.8.5, 1.8.6, 2.0.0
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>Priority: Trivial
> Fix For: 1.8.6, 2.0.0
>
> Attachments: PDFBOX-2122.patch
>
>
> TTFDataStream doesn't set the timezone for the calendar. GregorianCalendar 
> defaults to the system's timezone.  This means that people in different 
> timezones will get slightly different dates.  (TIKA-1325).
> One TTF Spec (https://developer.apple.com/fonts/TTRefMan/RM06/Chap6.html) 
> doesn't specify the timezone, but my guess would be UTC...except that it is 
> Apple, so maybe it's Cupertino. :)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2122) FontBox's TTFDataStream doesn't set timezone in readInternationalDate

2014-06-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025435#comment-14025435
 ] 

Hudson commented on PDFBOX-2122:


SUCCESS: Integrated in tika-trunk-jdk1.7 #36 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/36/])
TIKA-1325: small workaround until we can integrate PDFBOX-2122. Default 
timezone is now set and then unset for ttf test in FontParsers test. (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1601444)
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/font/FontParsersTest.java


> FontBox's TTFDataStream doesn't set timezone in readInternationalDate
> -
>
> Key: PDFBOX-2122
> URL: https://issues.apache.org/jira/browse/PDFBOX-2122
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 1.8.5, 1.8.6, 2.0.0
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>Priority: Trivial
> Fix For: 1.8.6, 2.0.0
>
> Attachments: PDFBOX-2122.patch
>
>
> TTFDataStream doesn't set the timezone for the calendar. GregorianCalendar 
> defaults to the system's timezone.  This means that people in different 
> timezones will get slightly different dates.  (TIKA-1325).
> One TTF Spec (https://developer.apple.com/fonts/TTRefMan/RM06/Chap6.html) 
> doesn't specify the timezone, but my guess would be UTC...except that it is 
> Apple, so maybe it's Cupertino. :)



--
This message was sent by Atlassian JIRA
(v6.2#6252)