[jira] [Commented] (PDFBOX-2896) XMPBox not creating valid title entry in DublinCoreSchema in trunk

2015-07-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634745#comment-14634745
 ] 

Hudson commented on PDFBOX-2896:


SUCCESS: Integrated in tika-trunk-jdk1.7 #796 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/796/])
TIKA-1678 -- initial commit.  Need to wait for fix to PDFBOX-2896 to generate 
test file. (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1692042)
* /tika/trunk/tika-parsers/src/main/java/org/apache/pdfbox
* /tika/trunk/tika-parsers/src/main/java/org/apache/pdfbox/pdfparser
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/pdfbox/pdfparser/PDFOctalUnicodeDecoder.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java


 XMPBox not creating valid title entry in DublinCoreSchema in trunk
 

 Key: PDFBOX-2896
 URL: https://issues.apache.org/jira/browse/PDFBOX-2896
 Project: PDFBox
  Issue Type: Bug
  Components: XmpBox
Affects Versions: 2.0.0
Reporter: Tim Allison
Priority: Minor

 On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the 
 XMP with XMPBox from PDFBox's trunk.  I modified the code from CreatePDFA by 
 adding this:
 {code}
 DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
 dc.setTitle(this is the title);
 {code}
 The generated PDF doesn't appear to have a compliant dc:title entry in the 
 XMP.  
 [~tilman] noted the divergence from the standard 
 [here|https://issues.apache.org/jira/browse/TIKA-1678?focusedCommentId=14634045page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634045].
 What PDFBox does:
 {code}
   dc:title
 rdf:Alt
   dc:lithis is the title/dc:li
 /rdf:Alt
   /dc:title
 {code}
 It should be:
 {code}
   dc:title
 rdf:Alt
   rdf:li xml:lang=x-defaultthis is the title/rdf:li
 /rdf:Alt
   /dc:title
 {code}
 Error message from the PDF-Tools validator:
 {quote}
 'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, 
 where N is a positive number.
 There is only one RDF resource allowed in XMP.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-1130) ExtractText -html doesn't always close the p tags it opens

2015-03-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343741#comment-14343741
 ] 

Hudson commented on PDFBOX-1130:


SUCCESS: Integrated in tika-trunk-jdk1.7 #524 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/524/])
TIKA-758 clean up after remembering PDFBOX-1130 (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1663424)
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java


 ExtractText -html doesn't always close the p tags it opens
 

 Key: PDFBOX-1130
 URL: https://issues.apache.org/jira/browse/PDFBOX-1130
 Project: PDFBox
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 1.8.0

 Attachments: 86.pdf, PDFBOX-1130.patch


 I have a test document (same one on PDFBOX-1129), which when run through 
 ExtractText -html, extracts the page number for each page, however in each 
 case the page number looks like:
 pNpText of page N...
 Ie, the p tag for the page number wasn't closed.
 Maybe related: if I run ExtractText without html, there is not space after 
 the page number and before the next word, ie I see words like 1Massachusetts, 
 2Course, 3also, 4the.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2122) FontBox's TTFDataStream doesn't set timezone in readInternationalDate

2014-06-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025435#comment-14025435
 ] 

Hudson commented on PDFBOX-2122:


SUCCESS: Integrated in tika-trunk-jdk1.7 #36 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/36/])
TIKA-1325: small workaround until we can integrate PDFBOX-2122. Default 
timezone is now set and then unset for ttf test in FontParsers test. (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1601444)
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/font/FontParsersTest.java


 FontBox's TTFDataStream doesn't set timezone in readInternationalDate
 -

 Key: PDFBOX-2122
 URL: https://issues.apache.org/jira/browse/PDFBOX-2122
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tim Allison
Assignee: Tilman Hausherr
Priority: Trivial
 Fix For: 1.8.6, 2.0.0

 Attachments: PDFBOX-2122.patch


 TTFDataStream doesn't set the timezone for the calendar. GregorianCalendar 
 defaults to the system's timezone.  This means that people in different 
 timezones will get slightly different dates.  (TIKA-1325).
 One TTF Spec (https://developer.apple.com/fonts/TTRefMan/RM06/Chap6.html) 
 doesn't specify the timezone, but my guess would be UTC...except that it is 
 Apple, so maybe it's Cupertino. :)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2122) FontBox's TTFDataStream doesn't set timezone in readInternationalDate

2014-06-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025463#comment-14025463
 ] 

Hudson commented on PDFBOX-2122:


SUCCESS: Integrated in tika-trunk-jdk1.6 #36 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/36/])
TIKA-1325: small workaround until we can integrate PDFBOX-2122. Default 
timezone is now set and then unset for ttf test in FontParsers test. (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1601444)
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/font/FontParsersTest.java


 FontBox's TTFDataStream doesn't set timezone in readInternationalDate
 -

 Key: PDFBOX-2122
 URL: https://issues.apache.org/jira/browse/PDFBOX-2122
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tim Allison
Assignee: Tilman Hausherr
Priority: Trivial
 Fix For: 1.8.6, 2.0.0

 Attachments: PDFBOX-2122.patch


 TTFDataStream doesn't set the timezone for the calendar. GregorianCalendar 
 defaults to the system's timezone.  This means that people in different 
 timezones will get slightly different dates.  (TIKA-1325).
 One TTF Spec (https://developer.apple.com/fonts/TTRefMan/RM06/Chap6.html) 
 doesn't specify the timezone, but my guess would be UTC...except that it is 
 Apple, so maybe it's Cupertino. :)



--
This message was sent by Atlassian JIRA
(v6.2#6252)