[jira] [Commented] (PDFBOX-1918) PDF convert error
[ https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948970#comment-13948970 ] Tilman Hausherr commented on PDFBOX-1918: - There will always be PDFs that are not correct. Your java application should simply report that, and tell that it won't be able to analyse it, maybe point to some FAQ, with the answer to "why didn't we index that PDF that was generated by this multi-billion-dollar corporation and is displayed by Adobe Viewer?" The customer will then have to find a way to get a correct PDF. In this case, either by learning about the "binary" option in his ftp software (if that was the cause), or (if the PDF was really generated this way by Oracle) by explaining an Oracle help desk intern assistant in Kasachstan the difference between unix newlines and windows newlines and then pray that this information will get up seven hierarchy levels and reach a developer who will put it in the "todo" list and so that they will include it in the next major release. That I was able to fix this PDF is just luck. Most PDFs aren't ascii readable like that one. > PDF convert error > - > > Key: PDFBOX-1918 > URL: https://issues.apache.org/jira/browse/PDFBOX-1918 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Utilities >Affects Versions: 1.8.4 >Reporter: Jr. John > Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf > > > Current version has same problem 1.8.4 > D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace > rpt1390780234888753.pdf test.pdf > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 15353 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 12156 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver > setStartxref > 警告: Did not found XRef object at specified startxref position 83636 > ConvertColorspace failed with the following exception: > java.io.IOException: Missing closing bracket for hex string. Reached EOS. > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816) > at > org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259) > at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133) > at > org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88) > at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:46) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1918) PDF convert error
[ https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948953#comment-13948953 ] Maruan Sahyoun commented on PDFBOX-1918: As Tilman wrote the PDF could have been corrupted after it was generated. So you could check how the file is transferred to you. What is the process from PDF generation until it ends in your possession? Does that process change the file unintendedly? Could you get a file from the customer directly at the source? Written to disk directly? Will PDFBox still complain with that file? > PDF convert error > - > > Key: PDFBOX-1918 > URL: https://issues.apache.org/jira/browse/PDFBOX-1918 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Utilities >Affects Versions: 1.8.4 >Reporter: Jr. John > Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf > > > Current version has same problem 1.8.4 > D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace > rpt1390780234888753.pdf test.pdf > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 15353 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 12156 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver > setStartxref > 警告: Did not found XRef object at specified startxref position 83636 > ConvertColorspace failed with the following exception: > java.io.IOException: Missing closing bracket for hex string. Reached EOS. > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816) > at > org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259) > at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133) > at > org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88) > at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:46) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1918) PDF convert error
[ https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948933#comment-13948933 ] Jr. John edited comment on PDFBOX-1918 at 3/27/14 6:18 AM: --- This report is from Oracle10gR2 AS Reports Services... We development an archive all mails system for exchange server. And all mails pdf files will be parsed with tika of pdfbox. Our java application will fail when mails with this kind of the pdf. We can't ask employee to correct their oracle pdf... Wish you can help...thank you. was (Author: jrjohn): This report is from Oracle10gR2 AS Reports Services... We development an archive all mails system for exchange server. And all mails pdf files will be parsed with tika of pdfbox. Our java application will fail When mails with this kind of the pdf. We can't ask employee to correct their oracle pdf... Wish you can help...thank you. > PDF convert error > - > > Key: PDFBOX-1918 > URL: https://issues.apache.org/jira/browse/PDFBOX-1918 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Utilities >Affects Versions: 1.8.4 >Reporter: Jr. John > Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf > > > Current version has same problem 1.8.4 > D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace > rpt1390780234888753.pdf test.pdf > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 15353 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 12156 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver > setStartxref > 警告: Did not found XRef object at specified startxref position 83636 > ConvertColorspace failed with the following exception: > java.io.IOException: Missing closing bracket for hex string. Reached EOS. > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816) > at > org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259) > at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133) > at > org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88) > at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:46) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1918) PDF convert error
[ https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948933#comment-13948933 ] Jr. John edited comment on PDFBOX-1918 at 3/27/14 6:18 AM: --- This report is from Oracle10gR2 AS Reports Services... We development an archive all mails system for exchange server. And all mails pdf files will be parsed with tika of pdfbox. Our java application will fail When mails with this kind of the pdf. We can't ask employee to correct their oracle pdf... Wish you can help...thank you. was (Author: jrjohn): This report is from Oracle10gR2 AS Reports Services... We development an archive all mails system for exchange server. And all mails pdf files will be parsed with tika of pdfbox. When mail with this kind of the pdf. Our java system will fail. We can't ask employee to correct their oracle pdf... Wish you can help...thank you. > PDF convert error > - > > Key: PDFBOX-1918 > URL: https://issues.apache.org/jira/browse/PDFBOX-1918 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Utilities >Affects Versions: 1.8.4 >Reporter: Jr. John > Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf > > > Current version has same problem 1.8.4 > D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace > rpt1390780234888753.pdf test.pdf > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 15353 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 12156 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver > setStartxref > 警告: Did not found XRef object at specified startxref position 83636 > ConvertColorspace failed with the following exception: > java.io.IOException: Missing closing bracket for hex string. Reached EOS. > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816) > at > org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259) > at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133) > at > org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88) > at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:46) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1918) PDF convert error
[ https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948933#comment-13948933 ] Jr. John commented on PDFBOX-1918: -- This report is from Oracle10gR2 AS Reports Services... We development an archive all mails system for exchange server. And all mails pdf files will be parsed with tika of pdfbox. When mail with this kind of the pdf. Our java system will fail. We can't ask employee to correct their oracle pdf... Wish you can help...thank you. > PDF convert error > - > > Key: PDFBOX-1918 > URL: https://issues.apache.org/jira/browse/PDFBOX-1918 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Utilities >Affects Versions: 1.8.4 >Reporter: Jr. John > Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf > > > Current version has same problem 1.8.4 > D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace > rpt1390780234888753.pdf test.pdf > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 15353 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 12156 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver > setStartxref > 警告: Did not found XRef object at specified startxref position 83636 > ConvertColorspace failed with the following exception: > java.io.IOException: Missing closing bracket for hex string. Reached EOS. > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816) > at > org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259) > at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133) > at > org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88) > at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:46) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1918) PDF convert error
[ https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948927#comment-13948927 ] Tilman Hausherr commented on PDFBOX-1918: - Good statement by a company that also does PDF conversion: [Buggy PDF Files, should we try to fix them?|http://blog.amyuni.com/?p=1627] > PDF convert error > - > > Key: PDFBOX-1918 > URL: https://issues.apache.org/jira/browse/PDFBOX-1918 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Utilities >Affects Versions: 1.8.4 >Reporter: Jr. John > Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf > > > Current version has same problem 1.8.4 > D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace > rpt1390780234888753.pdf test.pdf > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 15353 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 12156 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver > setStartxref > 警告: Did not found XRef object at specified startxref position 83636 > ConvertColorspace failed with the following exception: > java.io.IOException: Missing closing bracket for hex string. Reached EOS. > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816) > at > org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259) > at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133) > at > org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88) > at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:46) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1918) PDF convert error
[ https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1918: Attachment: rpt1390780234888753.pdf The whole xref table is incorrect. The likely reason is that PDF file was ftp-transferred in ascii mode instead of binary mode. I used NOTEPAD++ and replaced 0D 0A with 0D and suddenly the xref table matches. I am attaching the modified PDF so that you can view both with a hex editor. > PDF convert error > - > > Key: PDFBOX-1918 > URL: https://issues.apache.org/jira/browse/PDFBOX-1918 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Utilities >Affects Versions: 1.8.4 >Reporter: Jr. John > Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf > > > Current version has same problem 1.8.4 > D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace > rpt1390780234888753.pdf test.pdf > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 15353 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream > 警告: Specified stream length 12156 is wrong. Fall back to reading stream until > 'endstream'. > 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver > setStartxref > 警告: Did not found XRef object at specified startxref position 83636 > ConvertColorspace failed with the following exception: > java.io.IOException: Missing closing bracket for hex string. Reached EOS. > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816) > at > org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259) > at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133) > at > org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88) > at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:46) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1074) TIFFFaxDecoder5 when using PDFImageWriter
[ https://issues.apache.org/jira/browse/PDFBOX-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-1074: --- Assignee: Tilman Hausherr (was: Andreas Lehmkühler) > TIFFFaxDecoder5 when using PDFImageWriter > - > > Key: PDFBOX-1074 > URL: https://issues.apache.org/jira/browse/PDFBOX-1074 > Project: PDFBox > Issue Type: Bug > Components: Utilities >Affects Versions: 1.6.0, 1.8.4, 1.8.5 >Reporter: Anton Stremoukhov >Assignee: Tilman Hausherr > Labels: CCITTFaxDecode, ccitt > Fix For: 1.8.5, 2.0.0 > > Attachments: 34315.pdf, page_83.pdf, s2130312-100.pdf, > s2130312-100.pdf-1.tif, s2130312.pdf > > > I'm getting this when I try to PDFImageWriter.writeImage() on a PDF with one > page (see attached page_83.pdf): > Caused by: java.lang.Error: TIFFFaxDecoder5 > at > org.apache.pdfbox.filter.TIFFFaxDecoder.decodeT6(TIFFFaxDecoder.java:1005) > at > org.apache.pdfbox.filter.CCITTFaxDecodeFilter.decode(CCITTFaxDecodeFilter.java:101) > at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279) > at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221) > at > org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156) > at > org.apache.pdfbox.pdmodel.graphics.xobject.PDCcitt.getRGBImage(PDCcitt.java:153) > at > org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:78) > at > org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) > at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:107) > at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:722) > at > org.apache.pdfbox.util.PDFImageWriter.writeImage(PDFImageWriter.java:135) > at > org.apache.pdfbox.util.PDFImageWriter.writeImage(PDFImageWriter.java:105) > If you look on the pdf file i'm using (see attached page_83.pdf) you'll > notice its completely blank, but this is ok - page was obtained from source > pdf file with 84 pages where the last one is blank (see attached 34315.pdf). > Source pdf has been splitted on pages (without any errors) via Splitter like > so: > FileInputStream fis = new FileInputStream(file); > PDFParser parser = new PDFParser(fis); > parser.parse(); > COSDocument cosDoc = parser.getDocument(); > PDDocument pdDoc = new PDDocument(cosDoc); > > Splitter splitter = new Splitter(); > List pages = splitter.split(pdDoc); > for (int i = 0; i < pages.size(); i++){ > PDDocument pageDoc = pages.get(i); > String fileNameNew = "page_" + i + ".pdf"; > writeDocument(pageDoc, new File(destDir, fileNameNew).getPath()); > pageDoc.close(); > } > fis.close(); > cosDoc.close(); > pdDoc.close(); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2002) Show deprecation in the build
[ https://issues.apache.org/jira/browse/PDFBOX-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948265#comment-13948265 ] Tilman Hausherr commented on PDFBOX-2002: - I modified the parent POM in the trunk in rev. 1581986. > Show deprecation in the build > - > > Key: PDFBOX-2002 > URL: https://issues.apache.org/jira/browse/PDFBOX-2002 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Tilman Hausherr >Priority: Minor > Fix For: 2.0.0 > > > According to > https://pdfbox.apache.org/ideas.html > one of the tasks is "Remove all deprecated methods". Therefore, I will modify > the parent POM to show the deprecated calls. This will show such calls, but > not fail the build. It is a gentle hint to fix these calls. Lets leave this > issue open until all is done. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2002) Show deprecation in the build
Tilman Hausherr created PDFBOX-2002: --- Summary: Show deprecation in the build Key: PDFBOX-2002 URL: https://issues.apache.org/jira/browse/PDFBOX-2002 Project: PDFBox Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Tilman Hausherr Priority: Minor Fix For: 2.0.0 According to https://pdfbox.apache.org/ideas.html one of the tasks is "Remove all deprecated methods". Therefore, I will modify the parent POM to show the deprecated calls. This will show such calls, but not fail the build. It is a gentle hint to fix these calls. Lets leave this issue open until all is done. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1994) PDDocument.load(filename.pdf) hangs for pdf files having size
[ https://issues.apache.org/jira/browse/PDFBOX-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948183#comment-13948183 ] Tilman Hausherr commented on PDFBOX-1994: - I haven't tested whether it will work on 1.4. I have 1.7 here and soon 1.8. If your car runs on "super" gas, you don't fill the gas tank with "regular" gas, or do you? What you could do, if you have an old pc somewhere: install jre 1.4 and see what happens when you run your application. > PDDocument.load(filename.pdf) hangs for pdf files having size > - > > Key: PDFBOX-1994 > URL: https://issues.apache.org/jira/browse/PDFBOX-1994 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.4 >Reporter: brijesh > > The below code i am using for loading my pdf. but my pdf file is not a zero > sized files and having full permission and it is not a corrupt file also. but > i ddint get any error after code. it just hangs. > it is working in local, but not working in server . > (created ,jar files and then exe, then the .exe will excuted in the server) > java using 1,4 > PDDocument pdf=PDDocument.load("d:\\filename.pdf"); > pdf.print(); > please provide me why the same code is not working in server. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1086) Error when decoding CCITT compressed data that contains EOLs, fill bits etc.
[ https://issues.apache.org/jira/browse/PDFBOX-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948169#comment-13948169 ] Tilman Hausherr commented on PDFBOX-1086: - I fixed two of three decoders re fill bits. (I could fix the third one but would prefer to have a test file). Now there's only PDFBOX-457 left. It could be an EOL, but I can neither prove or disprove that theory. An EOL would make no sense in a G4 encoded document, at least according to wikipedia: https://en.wikipedia.org/wiki/Group_4_compression > Error when decoding CCITT compressed data that contains EOLs, fill bits etc. > > > Key: PDFBOX-1086 > URL: https://issues.apache.org/jira/browse/PDFBOX-1086 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Reporter: Jeremias Maerki >Assignee: Jeremias Maerki > Labels: CCITTFaxDecode, ccitt > > The TIFFFaxDecoder class (originally coming from JAI via XML Graphics > Commons) does not handle cases like EOLs between lines and in front. But the > PDF CCITTFaxDecode filter needs to allow many different variants of the > encoding. Apparently, TIFF has a relatively restricted way of encoding CCITT > data, so TIFFFaxDecoder was not written to be as flexible as we need it. > Ideally, PDFBox should handle anything that gets thrown at it. > It apprears that it would be rather difficult to retrofit TIFFFaxDecoder with > the necessary flexibility. So, new decoders for T.4 and T.6 should probably > be written. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PDFBOX-1708) IndexOutOfBoundsException on convertToImage with an embedded Fax-Image
[ https://issues.apache.org/jira/browse/PDFBOX-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-1708. - Resolution: Fixed Fix Version/s: 2.0.0 1.8.5 Fixed in rev 1581928 for the trunk and rev 1581946 for the 1.8 branch. The cause was that EncodedByteAlign = true was ignored. Try to download a snapshot version to see that it works :-) https://repository.apache.org/snapshots/org/apache/pdfbox/ > IndexOutOfBoundsException on convertToImage with an embedded Fax-Image > -- > > Key: PDFBOX-1708 > URL: https://issues.apache.org/jira/browse/PDFBOX-1708 > Project: PDFBox > Issue Type: Bug > Components: PDModel, Rendering >Affects Versions: 1.8.2 >Reporter: Martin Withake > Labels: CCITTFaxDecode, ccitt > Fix For: 1.8.5, 2.0.0 > > Attachments: IN06119.PDF > > > PDPage.convertToImage brings me this stacktrace: > java.lang.IndexOutOfBoundsException: offset + length > bit count > at > org.apache.pdfbox.io.ccitt.PackedBitArray.setBits(PackedBitArray.java:108) > at > org.apache.pdfbox.io.ccitt.CCITTFaxG31DDecodeInputStream.writeRun(CCITTFaxG31DDecodeInputStream.java:184) > at > org.apache.pdfbox.io.ccitt.CCITTFaxG31DDecodeInputStream.access$400(CCITTFaxG31DDecodeInputStream.java:29) > at > org.apache.pdfbox.io.ccitt.CCITTFaxG31DDecodeInputStream$RunLengthTreeNode.execute(CCITTFaxG31DDecodeInputStream.java:375) > at > org.apache.pdfbox.io.ccitt.CCITTFaxG31DDecodeInputStream.decodeLine(CCITTFaxG31DDecodeInputStream.java:165) > at > org.apache.pdfbox.io.ccitt.CCITTFaxG31DDecodeInputStream.read(CCITTFaxG31DDecodeInputStream.java:98) > at java.io.InputStream.read(InputStream.java:163) > at java.io.FilterInputStream.read(FilterInputStream.java:116) > at > org.apache.pdfbox.io.ccitt.FillOrderChangeInputStream.read(FillOrderChangeInputStream.java:45) > at java.io.FilterInputStream.read(FilterInputStream.java:90) > at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:68) > at > org.apache.pdfbox.filter.CCITTFaxDecodeFilter.decode(CCITTFaxDecodeFilter.java:114) > at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:295) > at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:237) > at > org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:172) > at > org.apache.pdfbox.pdmodel.graphics.xobject.PDCcitt.getRGBImage(PDCcitt.java:155) > at > org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:83) > at > org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) > at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:125) > at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:781) > at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:712) > at > de.rekers.ui.table.YDateianlageTable$4.doInBackground(YDateianlageTable.java:740) > at > de.rekers.ui.table.YDateianlageTable$4.doInBackground(YDateianlageTable.java:1) > at javax.swing.SwingWorker$1.call(SwingWorker.java:277) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at javax.swing.SwingWorker.run(SwingWorker.java:316) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > The document is partially rendered. The document is created by our fax > software. Acrobat Reader shows the document without an error. > Thanks in advance! > Martin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2001) Digital Signature information
[ https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947947#comment-13947947 ] Maruan Sahyoun commented on PDFBOX-2001: That’s for the Field Type. It checks if a field is a signature field [ISO 32000 Table 220]. That’s a different information to the signature dictionary which may or may not have the Sig entry for the dictionary type as you correctly found in the spec. > Digital Signature information > - > > Key: PDFBOX-2001 > URL: https://issues.apache.org/jira/browse/PDFBOX-2001 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.3 >Reporter: Nicolas Kaczmarski > Attachments: D.1_signiert.pdf, acrobatSignatureExample.PNG > > > We have a signed PDF but signature is described without key "Sig". > As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a > signature dictionary, this key is optional : > "(Optional) The type of PDF object that this dictionary describes; if > present, shall be Sig for a signature dictionary. " > But PDFBox seems to limit its research of signature only if this key "Sig" is > present. > What is your position about that? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2001) Digital Signature information
[ https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947938#comment-13947938 ] Nicolas Bouillon commented on PDFBOX-2001: -- In the following source, line 447 : http://svn.apache.org/viewvc/pdfbox/tags/1.8.4/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java?view=markup#l447 There is a test for the value COSName.SIG. It's not related ? > Digital Signature information > - > > Key: PDFBOX-2001 > URL: https://issues.apache.org/jira/browse/PDFBOX-2001 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.3 >Reporter: Nicolas Kaczmarski > Attachments: D.1_signiert.pdf, acrobatSignatureExample.PNG > > > We have a signed PDF but signature is described without key "Sig". > As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a > signature dictionary, this key is optional : > "(Optional) The type of PDF object that this dictionary describes; if > present, shall be Sig for a signature dictionary. " > But PDFBox seems to limit its research of signature only if this key "Sig" is > present. > What is your position about that? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2001) Digital Signature information
[ https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947924#comment-13947924 ] Maruan Sahyoun commented on PDFBOX-2001: PDFBox has an issue parsing the document. Try loading with PDDocument.loadNonSeq and you get a number of errors for wrong offset information. PDFBox doesn’t rely on the Sig key being present for the Type information of the dictionary. So currently the parsing seems to be the root cause. As you asked what our position about "But PDFBox seems to limit its research of signature only if this key "Sig" is present. „ is - well this limitation doesn’t seem to exist. We will need to look into why the parsing fails. > Digital Signature information > - > > Key: PDFBOX-2001 > URL: https://issues.apache.org/jira/browse/PDFBOX-2001 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.3 >Reporter: Nicolas Kaczmarski > Attachments: D.1_signiert.pdf, acrobatSignatureExample.PNG > > > We have a signed PDF but signature is described without key "Sig". > As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a > signature dictionary, this key is optional : > "(Optional) The type of PDF object that this dictionary describes; if > present, shall be Sig for a signature dictionary. " > But PDFBox seems to limit its research of signature only if this key "Sig" is > present. > What is your position about that? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1994) PDDocument.load(filename.pdf) hangs for pdf files having size
[ https://issues.apache.org/jira/browse/PDFBOX-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947898#comment-13947898 ] brijesh edited comment on PDFBOX-1994 at 3/26/14 1:27 PM: -- in server i am using j2re 4. in local java 6. but in my project (jdeveloper) , selected compiler as 1.4 . that is the reason i told that i am working in java 4 in local. i will update the jre in server to 5 or 6. then i will create .exe . i will test and inform you asap. is it sure, PDDocument.load(filename.pdf) not work for java 1,4? was (Author: bpv): in server i am using j2re 4. in local java 6. but in my project (jdeveloper) , selected compiler as 1.4 . that is the reason i told that i am working in java 4 in local. i will update the jre in server to 5 or 6. then i will create .exe . i will test and inform you asap. is it sure, PDDocument.load(filename.pdf) not work for 1,4? > PDDocument.load(filename.pdf) hangs for pdf files having size > - > > Key: PDFBOX-1994 > URL: https://issues.apache.org/jira/browse/PDFBOX-1994 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.4 >Reporter: brijesh > > The below code i am using for loading my pdf. but my pdf file is not a zero > sized files and having full permission and it is not a corrupt file also. but > i ddint get any error after code. it just hangs. > it is working in local, but not working in server . > (created ,jar files and then exe, then the .exe will excuted in the server) > java using 1,4 > PDDocument pdf=PDDocument.load("d:\\filename.pdf"); > pdf.print(); > please provide me why the same code is not working in server. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1994) PDDocument.load(filename.pdf) hangs for pdf files having size
[ https://issues.apache.org/jira/browse/PDFBOX-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947898#comment-13947898 ] brijesh commented on PDFBOX-1994: - in server i am using j2re 4. in local java 6. but in my project (jdeveloper) , selected compiler as 1.4 . that is the reason i told that i am working in java 4 in local. i will update the jre in server to 5 or 6. then i will create .exe . i will test and inform you asap. is it sure, PDDocument.load(filename.pdf) not work for 1,4? > PDDocument.load(filename.pdf) hangs for pdf files having size > - > > Key: PDFBOX-1994 > URL: https://issues.apache.org/jira/browse/PDFBOX-1994 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.4 >Reporter: brijesh > > The below code i am using for loading my pdf. but my pdf file is not a zero > sized files and having full permission and it is not a corrupt file also. but > i ddint get any error after code. it just hangs. > it is working in local, but not working in server . > (created ,jar files and then exe, then the .exe will excuted in the server) > java using 1,4 > PDDocument pdf=PDDocument.load("d:\\filename.pdf"); > pdf.print(); > please provide me why the same code is not working in server. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2001) Digital Signature information
[ https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Kaczmarski updated PDFBOX-2001: --- Attachment: acrobatSignatureExample.PNG > Digital Signature information > - > > Key: PDFBOX-2001 > URL: https://issues.apache.org/jira/browse/PDFBOX-2001 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.3 >Reporter: Nicolas Kaczmarski > Attachments: D.1_signiert.pdf, acrobatSignatureExample.PNG > > > We have a signed PDF but signature is described without key "Sig". > As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a > signature dictionary, this key is optional : > "(Optional) The type of PDF object that this dictionary describes; if > present, shall be Sig for a signature dictionary. " > But PDFBox seems to limit its research of signature only if this key "Sig" is > present. > What is your position about that? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2001) Digital Signature information
[ https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947884#comment-13947884 ] Nicolas Kaczmarski commented on PDFBOX-2001: If you open this document with Acrobat or FoxIt Reader, signature is well detected. (I have added a screnshot in attachment) This is not the case with PDFBox. > Digital Signature information > - > > Key: PDFBOX-2001 > URL: https://issues.apache.org/jira/browse/PDFBOX-2001 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.3 >Reporter: Nicolas Kaczmarski > Attachments: D.1_signiert.pdf > > > We have a signed PDF but signature is described without key "Sig". > As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a > signature dictionary, this key is optional : > "(Optional) The type of PDF object that this dictionary describes; if > present, shall be Sig for a signature dictionary. " > But PDFBox seems to limit its research of signature only if this key "Sig" is > present. > What is your position about that? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1234) NPE at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
[ https://issues.apache.org/jira/browse/PDFBOX-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947770#comment-13947770 ] Damon Li commented on PDFBOX-1234: -- I am working with the attached 'fw8bene--dft.pdf' which throws the same error for when setting a value to a field. Has anyone solved this issue yet? I have attached a unit test which fails for me when trying to set a value for the first text field, the name of which is 'topmostSubform[0].Page1[0].f1_001[0]' I believe. > NPE at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551) > --- > > Key: PDFBOX-1234 > URL: https://issues.apache.org/jira/browse/PDFBOX-1234 > Project: PDFBox > Issue Type: Bug > Components: AcroForm >Reporter: Christer Palm > Attachments: 200221.pdf, SetPDFFieldValueTest.java, fw8bene--dft.pdf > > > Using SVN trunk revision 1291094 (2012-02-18) > Getting the following stack trace when trying to call PDField.setValue() on a > AcroForm field in the attached document; > java.lang.NullPointerException > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.insertGeneratedAppearance(PDAppearance.java:371) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:281) > at > org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131) > Reason seems to be that PDApperance.getFontAndUpdateResources() returns null, > in turn because the font dictionary for the DA of the field ("/Cour 11 Tf 0 > g") is not present in the document. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (PDFBOX-1234) NPE at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
[ https://issues.apache.org/jira/browse/PDFBOX-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damon Li updated PDFBOX-1234: - Comment: was deleted (was: A unit test for fw8bene--dft.pdf) > NPE at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551) > --- > > Key: PDFBOX-1234 > URL: https://issues.apache.org/jira/browse/PDFBOX-1234 > Project: PDFBox > Issue Type: Bug > Components: AcroForm >Reporter: Christer Palm > Attachments: 200221.pdf, SetPDFFieldValueTest.java, fw8bene--dft.pdf > > > Using SVN trunk revision 1291094 (2012-02-18) > Getting the following stack trace when trying to call PDField.setValue() on a > AcroForm field in the attached document; > java.lang.NullPointerException > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.insertGeneratedAppearance(PDAppearance.java:371) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:281) > at > org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131) > Reason seems to be that PDApperance.getFontAndUpdateResources() returns null, > in turn because the font dictionary for the DA of the field ("/Cour 11 Tf 0 > g") is not present in the document. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1234) NPE at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
[ https://issues.apache.org/jira/browse/PDFBOX-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damon Li updated PDFBOX-1234: - Attachment: SetPDFFieldValueTest.java A unit test for fw8bene--dft.pdf > NPE at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551) > --- > > Key: PDFBOX-1234 > URL: https://issues.apache.org/jira/browse/PDFBOX-1234 > Project: PDFBox > Issue Type: Bug > Components: AcroForm >Reporter: Christer Palm > Attachments: 200221.pdf, SetPDFFieldValueTest.java, fw8bene--dft.pdf > > > Using SVN trunk revision 1291094 (2012-02-18) > Getting the following stack trace when trying to call PDField.setValue() on a > AcroForm field in the attached document; > java.lang.NullPointerException > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.insertGeneratedAppearance(PDAppearance.java:371) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:281) > at > org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131) > Reason seems to be that PDApperance.getFontAndUpdateResources() returns null, > in turn because the font dictionary for the DA of the field ("/Cour 11 Tf 0 > g") is not present in the document. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1234) NPE at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
[ https://issues.apache.org/jira/browse/PDFBOX-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damon Li updated PDFBOX-1234: - Attachment: fw8bene--dft.pdf > NPE at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551) > --- > > Key: PDFBOX-1234 > URL: https://issues.apache.org/jira/browse/PDFBOX-1234 > Project: PDFBox > Issue Type: Bug > Components: AcroForm >Reporter: Christer Palm > Attachments: 200221.pdf, fw8bene--dft.pdf > > > Using SVN trunk revision 1291094 (2012-02-18) > Getting the following stack trace when trying to call PDField.setValue() on a > AcroForm field in the attached document; > java.lang.NullPointerException > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.insertGeneratedAppearance(PDAppearance.java:371) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:281) > at > org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131) > Reason seems to be that PDApperance.getFontAndUpdateResources() returns null, > in turn because the font dictionary for the DA of the field ("/Cour 11 Tf 0 > g") is not present in the document. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2001) Digital Signature information
[ https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947753#comment-13947753 ] Thomas Chojecki commented on PDFBOX-2001: - There is a problem parsing the field dictionary. It should contain two entries but if I get the first entry, I will get the field dictionary object. It is some kind of parsing problem. The document catalog looks really weird, it contains only direct objects up to the felds. Also it has a second signature encapsulate inside the perms dictionary. > Digital Signature information > - > > Key: PDFBOX-2001 > URL: https://issues.apache.org/jira/browse/PDFBOX-2001 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.3 >Reporter: Nicolas Kaczmarski > Attachments: D.1_signiert.pdf > > > We have a signed PDF but signature is described without key "Sig". > As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a > signature dictionary, this key is optional : > "(Optional) The type of PDF object that this dictionary describes; if > present, shall be Sig for a signature dictionary. " > But PDFBox seems to limit its research of signature only if this key "Sig" is > present. > What is your position about that? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2000) White page when converting first page to image
[ https://issues.apache.org/jira/browse/PDFBOX-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947708#comment-13947708 ] Hong-Thai Nguyen commented on PDFBOX-2000: -- Thanks, I've noticed the PDFBox works fine on same file on Linux. > White page when converting first page to image > -- > > Key: PDFBOX-2000 > URL: https://issues.apache.org/jira/browse/PDFBOX-2000 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 1.8.4 > Environment: windows >Reporter: Hong-Thai Nguyen > Fix For: 2.0.0 > > Attachments: wrongpdf.pdf > > > When converting first page to image by this code for attached PDF: > {code} > private static BufferedImage computeImage(PDDocument document) throws > IOException { > int imageType = BufferedImage.TYPE_INT_RGB; > int resolution; > try { > resolution = Toolkit.getDefaultToolkit().getScreenResolution(); > } catch (HeadlessException e) { > resolution = 96; > } > PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(0); > try { > BufferedImage image = page.convertToImage(imageType, resolution); > return image; > } finally { > page = null; > } > } > {code} > returned image is the of a white page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2001) Digital Signature information
[ https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Kaczmarski updated PDFBOX-2001: --- Attachment: D.1_signiert.pdf File with signature not found > Digital Signature information > - > > Key: PDFBOX-2001 > URL: https://issues.apache.org/jira/browse/PDFBOX-2001 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.3 >Reporter: Nicolas Kaczmarski > Attachments: D.1_signiert.pdf > > > We have a signed PDF but signature is described without key "Sig". > As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a > signature dictionary, this key is optional : > "(Optional) The type of PDF object that this dictionary describes; if > present, shall be Sig for a signature dictionary. " > But PDFBox seems to limit its research of signature only if this key "Sig" is > present. > What is your position about that? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2001) Digital Signature information
Nicolas Kaczmarski created PDFBOX-2001: -- Summary: Digital Signature information Key: PDFBOX-2001 URL: https://issues.apache.org/jira/browse/PDFBOX-2001 Project: PDFBox Issue Type: Bug Affects Versions: 1.8.3 Reporter: Nicolas Kaczmarski We have a signed PDF but signature is described without key "Sig". As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a signature dictionary, this key is optional : "(Optional) The type of PDF object that this dictionary describes; if present, shall be Sig for a signature dictionary. " But PDFBox seems to limit its research of signature only if this key "Sig" is present. What is your position about that? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maruan Sahyoun updated PDFBOX-1512: --- Attachment: Topo.pdf Topo.txt TopoOverlap.pdf TopoOverlap.txt TopoContained.pdf TopoContained.txt A series of sample files model after the chart in http://en.wikipedia.org/wiki/Topological_sorting together with the text extraction done by Adobe Reader. > TextPositionComparator is not compatible with Java 7 > > > Key: PDFBOX-1512 > URL: https://issues.apache.org/jira/browse/PDFBOX-1512 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 1.7.1 > Environment: Java 7 >Reporter: Benjamin Papez >Assignee: Andreas Lehmkühler > Attachments: FOP-2252.pdf, TextPositionComparator.java, Topo.pdf, > Topo.txt, TopoContained.pdf, TopoContained.txt, TopoOverlap.pdf, > TopoOverlap.txt, WFI_PDFParser_TextPostionComparator.txt, > illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf > > > The TextPostionCompartor causes the following exception running on Java 7: > Unexpected RuntimeException from > org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison > method violates its general contract! > I think the problem is with this check: > if ( yDifference < .1 || > (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) || > (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom)) > as it violates the contract requirement: > The implementor must also ensure that the relation is transitive: > ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0. > Finally, the implementor must ensure that compare(x, y)==0 implies that > sgn(compare(x, z))==sgn(compare(y, z)) for all z. > Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947608#comment-13947608 ] Maruan Sahyoun edited comment on PDFBOX-1512 at 3/26/14 7:32 AM: - I’d think that we can find a sorting algorithm which can handle such cases. Before that what would be the expectation of the sorting result looking at the drawing Hannes provided? Shall we look at inspecting the results of other tools such as Adobe Reader and replicate their behavior? [Update] Adobe Reader would extract the sample to C,A,B I’m willing to look into solving the issue but would like to have some input on the end result first. Maruan was (Author: msahyoun): I’d think that we can find a sorting algorithm which can handle such cases. Before that what would be the expectation of the sorting result looking at the drawing Hannes provided? Shall we look at inspecting the results of other tools such as Adobe Reader and replicate their behavior? I’m willing to look into solving the issue but would like to have some input on the end result first. Maruan > TextPositionComparator is not compatible with Java 7 > > > Key: PDFBOX-1512 > URL: https://issues.apache.org/jira/browse/PDFBOX-1512 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 1.7.1 > Environment: Java 7 >Reporter: Benjamin Papez >Assignee: Andreas Lehmkühler > Attachments: FOP-2252.pdf, TextPositionComparator.java, > WFI_PDFParser_TextPostionComparator.txt, > illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf > > > The TextPostionCompartor causes the following exception running on Java 7: > Unexpected RuntimeException from > org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison > method violates its general contract! > I think the problem is with this check: > if ( yDifference < .1 || > (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) || > (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom)) > as it violates the contract requirement: > The implementor must also ensure that the relation is transitive: > ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0. > Finally, the implementor must ensure that compare(x, y)==0 implies that > sgn(compare(x, z))==sgn(compare(y, z)) for all z. > Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947608#comment-13947608 ] Maruan Sahyoun commented on PDFBOX-1512: I’d think that we can find a sorting algorithm which can handle such cases. Before that what would be the expectation of the sorting result looking at the drawing Hannes provided? Shall we look at inspecting the results of other tools such as Adobe Reader and replicate their behavior? I’m willing to look into solving the issue but would like to have some input on the end result first. Maruan > TextPositionComparator is not compatible with Java 7 > > > Key: PDFBOX-1512 > URL: https://issues.apache.org/jira/browse/PDFBOX-1512 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 1.7.1 > Environment: Java 7 >Reporter: Benjamin Papez >Assignee: Andreas Lehmkühler > Attachments: FOP-2252.pdf, TextPositionComparator.java, > WFI_PDFParser_TextPostionComparator.txt, > illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf > > > The TextPostionCompartor causes the following exception running on Java 7: > Unexpected RuntimeException from > org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison > method violates its general contract! > I think the problem is with this check: > if ( yDifference < .1 || > (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) || > (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom)) > as it violates the contract requirement: > The implementor must also ensure that the relation is transitive: > ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0. > Finally, the implementor must ensure that compare(x, y)==0 implies that > sgn(compare(x, z))==sgn(compare(y, z)) for all z. > Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)