[jira] [Commented] (PDFBOX-5795) Crash for Softmask with incorrect backdrop color components
[ https://issues.apache.org/jira/browse/PDFBOX-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833175#comment-17833175 ] Daniel Persson commented on PDFBOX-5795: Hi [~tilman] Seems reasonable. I've tried it, and it seems to work just fine visually. And anyway, my patch would have needed a null pointer check as well, so we don't introduce that error. I vote for your suggestion. Best regards Daniel > Crash for Softmask with incorrect backdrop color components > --- > > Key: PDFBOX-5795 > URL: https://issues.apache.org/jira/browse/PDFBOX-5795 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.31, 3.0.2 PDFBox >Reporter: Daniel Persson >Priority: Major > Attachments: borsen-2065-20111030-1-p4.pdf, crashfix.patch > > > This error occured in our production of an old archive. None of the files > crashed in any other viewer (Chrome, Adobe, Firefox, Poppler a.s.o). > > I've read up on the subject in the 1.7 specification, and it seems like > PDFBox is following the specification, but not being able to open these files > seems a bit too strict. > > The easiest way to reproduce is just to open the attached file with the > debugger. > {code:java} > java -jar debugger-app-4.0.0-SNAPSHOT.jar borsen-2065-20111030-1-p4.pdf {code} > The application will crash with this exception: > {code:java} > Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds > for length 1 > org.apache.pdfbox.pdmodel.graphics.color.PDColor.toRGB(PDColor.java:155) > > org.apache.pdfbox.rendering.PageDrawer$TransparencyGroup.(PageDrawer.java:1696) > > org.apache.pdfbox.rendering.PageDrawer$TransparencyGroup.(PageDrawer.java:1573) > > org.apache.pdfbox.rendering.PageDrawer.applySoftMaskToPaint(PageDrawer.java:604) > > org.apache.pdfbox.rendering.PageDrawer.showTransparencyGroupOnGraphics(PageDrawer.java:1549) > > org.apache.pdfbox.rendering.PageDrawer.showTransparencyGroup(PageDrawer.java:1489) > > org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:81) > > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872) > > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511) > > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489) > > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:158) > org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:270) > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:346) > > org.apache.pdfbox.debugger.pagepane.PagePane$RenderWorker.doInBackground(PagePane.java:527) > > org.apache.pdfbox.debugger.pagepane.PagePane$RenderWorker.doInBackground(PagePane.java:506) > java.base/java.lang.Thread.run(Thread.java:833) > {code} > My solution, added as a patch, is to add a fallback to the colorspace > available in the graphical context. This is working for the files I've tried > so far. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5795) Crash for Softmask with incorrect backdrop color components
Daniel Persson created PDFBOX-5795: -- Summary: Crash for Softmask with incorrect backdrop color components Key: PDFBOX-5795 URL: https://issues.apache.org/jira/browse/PDFBOX-5795 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 3.0.2 PDFBox, 2.0.31 Reporter: Daniel Persson Attachments: borsen-2065-20111030-1-p4.pdf, crashfix.patch This error occured in our production of an old archive. None of the files crashed in any other viewer (Chrome, Adobe, Firefox, Poppler a.s.o). I've read up on the subject in the 1.7 specification, and it seems like PDFBox is following the specification, but not being able to open these files seems a bit too strict. The easiest way to reproduce is just to open the attached file with the debugger. {code:java} java -jar debugger-app-4.0.0-SNAPSHOT.jar borsen-2065-20111030-1-p4.pdf {code} The application will crash with this exception: {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 org.apache.pdfbox.pdmodel.graphics.color.PDColor.toRGB(PDColor.java:155) org.apache.pdfbox.rendering.PageDrawer$TransparencyGroup.(PageDrawer.java:1696) org.apache.pdfbox.rendering.PageDrawer$TransparencyGroup.(PageDrawer.java:1573) org.apache.pdfbox.rendering.PageDrawer.applySoftMaskToPaint(PageDrawer.java:604) org.apache.pdfbox.rendering.PageDrawer.showTransparencyGroupOnGraphics(PageDrawer.java:1549) org.apache.pdfbox.rendering.PageDrawer.showTransparencyGroup(PageDrawer.java:1489) org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:81) org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872) org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511) org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489) org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:158) org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:270) org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:346) org.apache.pdfbox.debugger.pagepane.PagePane$RenderWorker.doInBackground(PagePane.java:527) org.apache.pdfbox.debugger.pagepane.PagePane$RenderWorker.doInBackground(PagePane.java:506) java.base/java.lang.Thread.run(Thread.java:833) {code} My solution, added as a patch, is to add a fallback to the colorspace available in the graphical context. This is working for the files I've tried so far. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5788) ID References changes when saving PDFs.
Daniel Persson created PDFBOX-5788: -- Summary: ID References changes when saving PDFs. Key: PDFBOX-5788 URL: https://issues.apache.org/jira/browse/PDFBOX-5788 Project: PDFBox Issue Type: Bug Affects Versions: 3.0.2 PDFBox, 3.0.1 PDFBox Reporter: Daniel Persson {code:java} private static void runPDF(String name) throws IOException, NoSuchAlgorithmException { PDDocument doc = Loader.loadPDF(new File(name)); File tmpFile = File.createTempFile("tmp", ".pdf"); doc.save(tmpFile); byte[] data = Files.readAllBytes(Paths.get(tmpFile.getAbsolutePath())); byte[] hash = MessageDigest.getInstance("SHA256").digest(data); System.out.println(encodeHexString(hash)); File tmpFile2 = File.createTempFile("tmp", ".pdf"); doc.save(tmpFile2); byte[] data2 = Files.readAllBytes(Paths.get(tmpFile2.getAbsolutePath())); byte[] hash2 = MessageDigest.getInstance("SHA256").digest(data2); System.out.println(encodeHexString(hash2)); } {code} Not sure, this might be expected behavior but it makes my testing framework a bit less robust so I thought I'd report it here. In the newer versions 3.0.2 and 3.0.1 when you save a PDF the second time the reference ID's continue incrementing which means that the PDF stored the first time is not identical to the second time. In my test case depending on what thread executes first there might be difference in the run and the expected result changes. I've not seen this with 3.0.0 and earlier versions of PDFBox. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5442) Rending with the incorrect color
Daniel Persson created PDFBOX-5442: -- Summary: Rending with the incorrect color Key: PDFBOX-5442 URL: https://issues.apache.org/jira/browse/PDFBOX-5442 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.26 Reporter: Daniel Persson Attachments: 23115_133_1_25693_17.pdf, 23115_133_1_25693_171.png Hi Team. We have noticed that PDFBox sometimes renders with brighter colors than other renderers and it doesn't matter that much on photos but when a PDF is split into multiple smaller images and all images aren't rendered with the same hue you will have a strangely looking image. To reproduce: Open PDF in Debugger or render an image with PDFToImage. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5294) Incorrect rendering of Type3 character
[ https://issues.apache.org/jira/browse/PDFBOX-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-5294: --- Attachment: issue-1.pdf incorrect.png correct.png > Incorrect rendering of Type3 character > -- > > Key: PDFBOX-5294 > URL: https://issues.apache.org/jira/browse/PDFBOX-5294 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.24 >Reporter: Daniel Persson >Priority: Major > Attachments: correct.png, incorrect.png, issue-1.pdf, > type3resources.patch > > > Hi Team. > > We got a report from one of our customers that their images weren't rendered > correctly. Looking into it, we found that a Type3 character contained an > image. > > That image was present on the character glyph resource table and not the font > resource table which is a bit strange if you read the specification this > should not be allowed. > Then again Chrome, Opera, IE 11, and Adobe render this file correctly. But > Safari, Firefox, and Poppler are not rendering this PDF correctly. > > I've created a small patch that will solve this issue. > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5294) Incorrect rendering of Type3 character
[ https://issues.apache.org/jira/browse/PDFBOX-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-5294: --- Attachment: type3resources.patch > Incorrect rendering of Type3 character > -- > > Key: PDFBOX-5294 > URL: https://issues.apache.org/jira/browse/PDFBOX-5294 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.24 >Reporter: Daniel Persson >Priority: Major > Attachments: type3resources.patch > > > Hi Team. > > We got a report from one of our customers that their images weren't rendered > correctly. Looking into it, we found that a Type3 character contained an > image. > > That image was present on the character glyph resource table and not the font > resource table which is a bit strange if you read the specification this > should not be allowed. > Then again Chrome, Opera, IE 11, and Adobe render this file correctly. But > Safari, Firefox, and Poppler are not rendering this PDF correctly. > > I've created a small patch that will solve this issue. > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5294) Incorrect rendering of Type3 character
Daniel Persson created PDFBOX-5294: -- Summary: Incorrect rendering of Type3 character Key: PDFBOX-5294 URL: https://issues.apache.org/jira/browse/PDFBOX-5294 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.24 Reporter: Daniel Persson Attachments: type3resources.patch Hi Team. We got a report from one of our customers that their images weren't rendered correctly. Looking into it, we found that a Type3 character contained an image. That image was present on the character glyph resource table and not the font resource table which is a bit strange if you read the specification this should not be allowed. Then again Chrome, Opera, IE 11, and Adobe render this file correctly. But Safari, Firefox, and Poppler are not rendering this PDF correctly. I've created a small patch that will solve this issue. Best regards Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5170) Compression creates issue with Page structure
[ https://issues.apache.org/jira/browse/PDFBOX-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327107#comment-17327107 ] Daniel Persson commented on PDFBOX-5170: Hi [~mkl] No, I thought there where a difference between tables and streams and that you only needed one of them. "Applications that do not support PDF 1.5 cannot access objects that are referenced by cross-reference streams. If a file uses cross-reference streams exclusively, it cannot be opened by such applications." So my understanding was that you only needed the table to read the document, but the streams were more efficient but not supported by older readers. But I'm still trying to figure this out. Best regards Daniel > Compression creates issue with Page structure > - > > Key: PDFBOX-5170 > URL: https://issues.apache.org/jira/browse/PDFBOX-5170 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.0 PDFBox >Reporter: Daniel Persson >Priority: Minor > > > Hi Team. > > PDFBox version 3.0.0-RC1 > pdftoppm version 21.04.0 > mupdf-gl version 1.18.0 > > This might be an unusual issue but might needs to be checked. The simple code > below creates a PDF that can't be viewed with Poppler because of "error: > malformed page tree" > {code:java} > PDDocument testPdf = Loader.loadPDF(new File("input.pdf")); > testPdf.save(new File("output.pdf")); > testPdf.close(); > PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf")); > testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION); > testPdf2.close(); > {code} > This is not a content issue because all PDFs from the same producer have the > same problem, I've just picked an example. > Best regards > Daniel > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5170) Compression creates issue with Page structure
[ https://issues.apache.org/jira/browse/PDFBOX-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326553#comment-17326553 ] Daniel Persson commented on PDFBOX-5170: Hi [~mkl] I can verify that your fix solves the issue. Another thing that might be related when reading the specification about Cross-Reference Stream (new knowledge for me) is that they may not be encrypted and need a different flag if compressed. But do I understand the specification correctly that this is extra information for performance and not required to present the document correctly? Best regards Daniel > Compression creates issue with Page structure > - > > Key: PDFBOX-5170 > URL: https://issues.apache.org/jira/browse/PDFBOX-5170 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.0 PDFBox >Reporter: Daniel Persson >Priority: Minor > > > Hi Team. > > PDFBox version 3.0.0-RC1 > pdftoppm version 21.04.0 > mupdf-gl version 1.18.0 > > This might be an unusual issue but might needs to be checked. The simple code > below creates a PDF that can't be viewed with Poppler because of "error: > malformed page tree" > {code:java} > PDDocument testPdf = Loader.loadPDF(new File("input.pdf")); > testPdf.save(new File("output.pdf")); > testPdf.close(); > PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf")); > testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION); > testPdf2.close(); > {code} > This is not a content issue because all PDFs from the same producer have the > same problem, I've just picked an example. > Best regards > Daniel > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5170) Compression creates issue with Page structure
[ https://issues.apache.org/jira/browse/PDFBOX-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326282#comment-17326282 ] Daniel Persson commented on PDFBOX-5170: Hi Tilman. I've tested it now with SNAPSHOT pdfbox-3.0.0-20210420.210105-2628.jar. The issue still exists. Compressing the file makes it not readable. If I turn compression off, it will be readable. In the example above, output.pdf is not readable, but output2.pdf is. Best regards Daniel > Compression creates issue with Page structure > - > > Key: PDFBOX-5170 > URL: https://issues.apache.org/jira/browse/PDFBOX-5170 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.0 PDFBox >Reporter: Daniel Persson >Priority: Minor > > > Hi Team. > > PDFBox version 3.0.0-RC1 > pdftoppm version 21.04.0 > mupdf-gl version 1.18.0 > > This might be an unusual issue but might needs to be checked. The simple code > below creates a PDF that can't be viewed with Poppler because of "error: > malformed page tree" > {code:java} > PDDocument testPdf = Loader.loadPDF(new File("input.pdf")); > testPdf.save(new File("output.pdf")); > testPdf.close(); > PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf")); > testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION); > testPdf2.close(); > {code} > This is not a content issue because all PDFs from the same producer have the > same problem, I've just picked an example. > Best regards > Daniel > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5170) Compression creates issue with Page structure
[ https://issues.apache.org/jira/browse/PDFBOX-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-5170: --- Description: Hi Team. PDFBox version 3.0.0-RC1 pdftoppm version 21.04.0 mupdf-gl version 1.18.0 This might be an unusual issue but might needs to be checked. The simple code below creates a PDF that can't be viewed with Poppler because of "error: malformed page tree" {code:java} PDDocument testPdf = Loader.loadPDF(new File("input.pdf")); testPdf.save(new File("output.pdf")); testPdf.close(); PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf")); testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION); testPdf2.close(); {code} This is not a content issue because all PDFs from the same producer have the same problem, I've just picked an example. Best regards Daniel was: Hi Team. This might be an unusual issue but might needs to be checked. The simple code below creates a PDF that can't be viewed with Poppler because of "error: malformed page tree" {code:java} PDDocument testPdf = Loader.loadPDF(new File("input.pdf")); testPdf.save(new File("output.pdf")); testPdf.close(); PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf")); testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION); testPdf2.close(); {code} This is not a content issue because all PDFs from the same producer have the same problem, I've just picked an example. Best regards Daniel > Compression creates issue with Page structure > - > > Key: PDFBOX-5170 > URL: https://issues.apache.org/jira/browse/PDFBOX-5170 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.0 PDFBox >Reporter: Daniel Persson >Priority: Minor > > > Hi Team. > > PDFBox version 3.0.0-RC1 > pdftoppm version 21.04.0 > mupdf-gl version 1.18.0 > > This might be an unusual issue but might needs to be checked. The simple code > below creates a PDF that can't be viewed with Poppler because of "error: > malformed page tree" > {code:java} > PDDocument testPdf = Loader.loadPDF(new File("input.pdf")); > testPdf.save(new File("output.pdf")); > testPdf.close(); > PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf")); > testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION); > testPdf2.close(); > {code} > This is not a content issue because all PDFs from the same producer have the > same problem, I've just picked an example. > Best regards > Daniel > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5170) Compression creates issue with Page structure
[ https://issues.apache.org/jira/browse/PDFBOX-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325622#comment-17325622 ] Daniel Persson commented on PDFBOX-5170: Still problem with uploading larger PDFs so added it to an old directory for incorrect PDFs. https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing > Compression creates issue with Page structure > - > > Key: PDFBOX-5170 > URL: https://issues.apache.org/jira/browse/PDFBOX-5170 > Project: PDFBox > Issue Type: Bug >Affects Versions: 3.0.0 PDFBox >Reporter: Daniel Persson >Priority: Minor > > > Hi Team. > > This might be an unusual issue but might needs to be checked. The simple code > below creates a PDF that can't be viewed with Poppler because of "error: > malformed page tree" > {code:java} > PDDocument testPdf = Loader.loadPDF(new File("input.pdf")); > testPdf.save(new File("output.pdf")); > testPdf.close(); > PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf")); > testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION); > testPdf2.close(); > {code} > This is not a content issue because all PDFs from the same producer have the > same problem, I've just picked an example. > Best regards > Daniel > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5170) Compression creates issue with Page structure
Daniel Persson created PDFBOX-5170: -- Summary: Compression creates issue with Page structure Key: PDFBOX-5170 URL: https://issues.apache.org/jira/browse/PDFBOX-5170 Project: PDFBox Issue Type: Bug Affects Versions: 3.0.0 PDFBox Reporter: Daniel Persson Hi Team. This might be an unusual issue but might needs to be checked. The simple code below creates a PDF that can't be viewed with Poppler because of "error: malformed page tree" {code:java} PDDocument testPdf = Loader.loadPDF(new File("input.pdf")); testPdf.save(new File("output.pdf")); testPdf.close(); PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf")); testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION); testPdf2.close(); {code} This is not a content issue because all PDFs from the same producer have the same problem, I've just picked an example. Best regards Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5135) Image can't render text.
[ https://issues.apache.org/jira/browse/PDFBOX-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304413#comment-17304413 ] Daniel Persson commented on PDFBOX-5135: Hi Tilman. Great work, but I guess this will not be in the upcoming release? Best regards Daniel > Image can't render text. > > > Key: PDFBOX-5135 > URL: https://issues.apache.org/jira/browse/PDFBOX-5135 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.22 >Reporter: Daniel Persson >Assignee: Tilman Hausherr >Priority: Major > Attachments: 514867_709_1_18803-27-1.jpg, > 514867_709_1_18803-27-ppm-1.jpg, 517551_709_1_19315-23-1.jpg, > 517551_709_1_19315-23-ppm-1.jpg, image-2021-03-18-18-02-17-417.png > > > Hi Team > > We have found a PDF that can't be rendered correctly in PDFBox. It renders > correctly in Adobe and Poppler. > > PDFs could not be uploaded so I've added them to a google drive folder. If > that doesn't work please tell me and provide a way to send them. > > [https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing] > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5135) Image can't render text.
[ https://issues.apache.org/jira/browse/PDFBOX-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-5135: --- Description: Hi Team We have found a PDF that can't be rendered correctly in PDFBox. It renders correctly in Adobe and Poppler. PDFs could not be uploaded so I've added them to a google drive folder. If that doesn't work please tell me and provide a way to send them. [https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing] Best regards Daniel was: Hi Team We have found a PDF that can't be rendered correctly in PDFBox. It renders correctly in Adobe and Poppler. PDFs could not be uploaded so I've added them to a google drive folder. If that don't work please tell me and provide a way to send them. https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing Best regards Daniel > Image can't render text. > > > Key: PDFBOX-5135 > URL: https://issues.apache.org/jira/browse/PDFBOX-5135 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.22 >Reporter: Daniel Persson >Priority: Major > Attachments: 514867_709_1_18803-27-1.jpg, > 514867_709_1_18803-27-ppm-1.jpg, 517551_709_1_19315-23-1.jpg, > 517551_709_1_19315-23-ppm-1.jpg > > > Hi Team > > We have found a PDF that can't be rendered correctly in PDFBox. It renders > correctly in Adobe and Poppler. > > PDFs could not be uploaded so I've added them to a google drive folder. If > that doesn't work please tell me and provide a way to send them. > > [https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing] > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5135) Image can't render text.
[ https://issues.apache.org/jira/browse/PDFBOX-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-5135: --- Description: Hi Team We have found a PDF that can't be rendered correctly in PDFBox. It renders correctly in Adobe and Poppler. PDFs could not be uploaded so I've added them to a google drive folder. If that don't work please tell me and provide a way to send them. https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing Best regards Daniel was: Hi Team We have found a PDF that can't be rendered correctly in PDFBox. It renders correctly in Adobe and Poppler. Best regards Daniel > Image can't render text. > > > Key: PDFBOX-5135 > URL: https://issues.apache.org/jira/browse/PDFBOX-5135 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.22 >Reporter: Daniel Persson >Priority: Major > Attachments: 514867_709_1_18803-27-1.jpg, > 514867_709_1_18803-27-ppm-1.jpg, 517551_709_1_19315-23-1.jpg, > 517551_709_1_19315-23-ppm-1.jpg > > > Hi Team > > We have found a PDF that can't be rendered correctly in PDFBox. It renders > correctly in Adobe and Poppler. > > PDFs could not be uploaded so I've added them to a google drive folder. If > that don't work please tell me and provide a way to send them. > > https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5135) Image can't render text.
[ https://issues.apache.org/jira/browse/PDFBOX-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-5135: --- Attachment: 514867_709_1_18803-27-1.jpg 517551_709_1_19315-23-1.jpg 514867_709_1_18803-27-ppm-1.jpg 517551_709_1_19315-23-ppm-1.jpg > Image can't render text. > > > Key: PDFBOX-5135 > URL: https://issues.apache.org/jira/browse/PDFBOX-5135 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.22 >Reporter: Daniel Persson >Priority: Major > Attachments: 514867_709_1_18803-27-1.jpg, > 514867_709_1_18803-27-ppm-1.jpg, 517551_709_1_19315-23-1.jpg, > 517551_709_1_19315-23-ppm-1.jpg > > > Hi Team > > We have found a PDF that can't be rendered correctly in PDFBox. It renders > correctly in Adobe and Poppler. > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5135) Image can't render text.
Daniel Persson created PDFBOX-5135: -- Summary: Image can't render text. Key: PDFBOX-5135 URL: https://issues.apache.org/jira/browse/PDFBOX-5135 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.22 Reporter: Daniel Persson Hi Team We have found a PDF that can't be rendered correctly in PDFBox. It renders correctly in Adobe and Poppler. Best regards Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4917) Images are blurry after updating to 2.0.20
[ https://issues.apache.org/jira/browse/PDFBOX-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159438#comment-17159438 ] Daniel Persson commented on PDFBOX-4917: Hi again, update. I did a git checkout on the 2.0 branch and tested to run the application there, and the image created on that branch did not have the issue mentioned above. Perhaps you can just verify if solved in the upcoming release. If so, we will be waiting patiently for the release of the coming weeks. Best regards Daniel > Images are blurry after updating to 2.0.20 > -- > > Key: PDFBOX-4917 > URL: https://issues.apache.org/jira/browse/PDFBOX-4917 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Critical > Attachments: issue.pdf, pdfbox-app-2.0.19.jpg, pdfbox-app-2.0.20.jpg > > > Hi team. > We have noticed that after updating to PDFBox to 2.0.20 some images are > blurry and unreadable even in 300 DPI. > We have rendered both these images with the same parameters just different > versions of PDFBox. > {code:java} > java -jar pdfbox-app-2.0.19.jar PDFToImage -dpi 300 -quality 0.95 issue.pdf > java -jar pdfbox-app-2.0.20.jar PDFToImage -dpi 300 -quality 0.95 > issue.pdf{code} > Hope we can find a solution to this problem, perhaps it is related to > PDFBOX-4516? > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4917) Images are blurry after updating to 2.0.20
Daniel Persson created PDFBOX-4917: -- Summary: Images are blurry after updating to 2.0.20 Key: PDFBOX-4917 URL: https://issues.apache.org/jira/browse/PDFBOX-4917 Project: PDFBox Issue Type: Bug Reporter: Daniel Persson Attachments: issue.pdf, pdfbox-app-2.0.19.jpg, pdfbox-app-2.0.20.jpg Hi team. We have noticed that after updating to PDFBox to 2.0.20 some images are blurry and unreadable even in 300 DPI. We have rendered both these images with the same parameters just different versions of PDFBox. {code:java} java -jar pdfbox-app-2.0.19.jar PDFToImage -dpi 300 -quality 0.95 issue.pdf java -jar pdfbox-app-2.0.20.jar PDFToImage -dpi 300 -quality 0.95 issue.pdf{code} Hope we can find a solution to this problem, perhaps it is related to PDFBOX-4516? Best regards Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4917) Images are blurry after updating to 2.0.20
[ https://issues.apache.org/jira/browse/PDFBOX-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4917: --- Attachment: pdfbox-app-2.0.20.jpg pdfbox-app-2.0.19.jpg issue.pdf > Images are blurry after updating to 2.0.20 > -- > > Key: PDFBOX-4917 > URL: https://issues.apache.org/jira/browse/PDFBOX-4917 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Critical > Attachments: issue.pdf, pdfbox-app-2.0.19.jpg, pdfbox-app-2.0.20.jpg > > > Hi team. > We have noticed that after updating to PDFBox to 2.0.20 some images are > blurry and unreadable even in 300 DPI. > We have rendered both these images with the same parameters just different > versions of PDFBox. > {code:java} > java -jar pdfbox-app-2.0.19.jar PDFToImage -dpi 300 -quality 0.95 issue.pdf > java -jar pdfbox-app-2.0.20.jar PDFToImage -dpi 300 -quality 0.95 > issue.pdf{code} > Hope we can find a solution to this problem, perhaps it is related to > PDFBOX-4516? > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4852) Image rendering issue 3
[ https://issues.apache.org/jira/browse/PDFBOX-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4852: --- Attachment: issue3.pdf issue3-pdfbox.jpg issue3-poppler.jpg > Image rendering issue 3 > --- > > Key: PDFBOX-4852 > URL: https://issues.apache.org/jira/browse/PDFBOX-4852 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.19 >Reporter: Daniel Persson >Priority: Minor > Attachments: issue3-pdfbox.jpg, issue3-poppler.jpg, issue3.pdf > > > Text is unreadable in image rendered using PDFToImage. > > Text is readable if you render with Poppler instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4852) Image rendering issue 3
Daniel Persson created PDFBOX-4852: -- Summary: Image rendering issue 3 Key: PDFBOX-4852 URL: https://issues.apache.org/jira/browse/PDFBOX-4852 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.19 Reporter: Daniel Persson Attachments: issue3-pdfbox.jpg, issue3-poppler.jpg, issue3.pdf Text is unreadable in image rendered using PDFToImage. Text is readable if you render with Poppler instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4851) Image rendering issue 2
Daniel Persson created PDFBOX-4851: -- Summary: Image rendering issue 2 Key: PDFBOX-4851 URL: https://issues.apache.org/jira/browse/PDFBOX-4851 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.19 Reporter: Daniel Persson Attachments: issue2-pdfbox.jpg, issue2-poppler.jpg, issue2.pdf Text is missing in image rendered using PDFToImage. Text is present if you render with Poppler instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4851) Image rendering issue 2
[ https://issues.apache.org/jira/browse/PDFBOX-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4851: --- Attachment: issue2.pdf issue2-poppler.jpg issue2-pdfbox.jpg > Image rendering issue 2 > --- > > Key: PDFBOX-4851 > URL: https://issues.apache.org/jira/browse/PDFBOX-4851 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.19 >Reporter: Daniel Persson >Priority: Minor > Attachments: issue2-pdfbox.jpg, issue2-poppler.jpg, issue2.pdf > > > Text is missing in image rendered using PDFToImage. > > Text is present if you render with Poppler instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4850) Image rendering issue
[ https://issues.apache.org/jira/browse/PDFBOX-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4850: --- Attachment: issue.pdf issue-pdfbox.jpg issue-poppler.jpg > Image rendering issue > - > > Key: PDFBOX-4850 > URL: https://issues.apache.org/jira/browse/PDFBOX-4850 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.19 >Reporter: Daniel Persson >Priority: Minor > Attachments: issue-pdfbox.jpg, issue-poppler.jpg, issue.pdf > > > Rendering file using PDFToImage creates a strange result where embedded > images aren't rotated or scaled correctly. > > Rendering the same PDF using poppler will create a correct looking output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4850) Image rendering issue
Daniel Persson created PDFBOX-4850: -- Summary: Image rendering issue Key: PDFBOX-4850 URL: https://issues.apache.org/jira/browse/PDFBOX-4850 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.19 Reporter: Daniel Persson Attachments: issue-pdfbox.jpg, issue-poppler.jpg, issue.pdf Rendering file using PDFToImage creates a strange result where embedded images aren't rotated or scaled correctly. Rendering the same PDF using poppler will create a correct looking output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4850) Image rendering issue
[ https://issues.apache.org/jira/browse/PDFBOX-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4850: --- Issue Type: Bug (was: Improvement) > Image rendering issue > - > > Key: PDFBOX-4850 > URL: https://issues.apache.org/jira/browse/PDFBOX-4850 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.19 >Reporter: Daniel Persson >Priority: Minor > Attachments: issue-pdfbox.jpg, issue-poppler.jpg, issue.pdf > > > Rendering file using PDFToImage creates a strange result where embedded > images aren't rotated or scaled correctly. > > Rendering the same PDF using poppler will create a correct looking output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4762) Inconsistent handling of incorrect data
[ https://issues.apache.org/jira/browse/PDFBOX-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4762: --- Description: We had a PDF that had a strange page with 200Mb+ of text to extract and the deflate function did not work correctly. This created a fatal in PDFBox and I did some debugging and noticed that we handle SetNonStrokingColorSpace and SetStrokingColorSpace in different ways. One of them had a check if the in data was incorrect and returned and the other one did not have this check. I made this small patch that I will include in this issue to rectify this inconsistency. Added the crashing pdf on my google drive if you want it to test with https://drive.google.com/open?id=1bcT27NoqNM-pphYiFCy13bq81potqUc6 Best regards Daniel was: We had a PDF that had a strange page with 200Mb+ of text to extract and the deflate function did not work correctly. This created a fatal in PDFBox and I did some debugging and noticed that we handle SetNonStrokingColorSpace and SetStrokingColorSpace in different ways. One of them had a check if the in data was incorrect and returned and the other one did not have this check. I made this small patch that I will include in this issue to rectify this inconsistency. Best regards Daniel > Inconsistent handling of incorrect data > --- > > Key: PDFBOX-4762 > URL: https://issues.apache.org/jira/browse/PDFBOX-4762 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.18 >Reporter: Daniel Persson >Priority: Minor > Labels: patch > Attachments: inconsistant.patch > > > We had a PDF that had a strange page with 200Mb+ of text to extract and the > deflate function did not work correctly. > This created a fatal in PDFBox and I did some debugging and noticed that we > handle SetNonStrokingColorSpace and SetStrokingColorSpace in different ways. > One of them had a check if the in data was incorrect and returned and the > other one did not have this check. > I made this small patch that I will include in this issue to rectify this > inconsistency. > > Added the crashing pdf on my google drive if you want it to test with > https://drive.google.com/open?id=1bcT27NoqNM-pphYiFCy13bq81potqUc6 > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4762) Inconsistent handling of incorrect data
Daniel Persson created PDFBOX-4762: -- Summary: Inconsistent handling of incorrect data Key: PDFBOX-4762 URL: https://issues.apache.org/jira/browse/PDFBOX-4762 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.18 Reporter: Daniel Persson Attachments: inconsistant.patch We had a PDF that had a strange page with 200Mb+ of text to extract and the deflate function did not work correctly. This created a fatal in PDFBox and I did some debugging and noticed that we handle SetNonStrokingColorSpace and SetStrokingColorSpace in different ways. One of them had a check if the in data was incorrect and returned and the other one did not have this check. I made this small patch that I will include in this issue to rectify this inconsistency. Best regards Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4762) Inconsistent handling of incorrect data
[ https://issues.apache.org/jira/browse/PDFBOX-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4762: --- Attachment: inconsistant.patch > Inconsistent handling of incorrect data > --- > > Key: PDFBOX-4762 > URL: https://issues.apache.org/jira/browse/PDFBOX-4762 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.18 >Reporter: Daniel Persson >Priority: Minor > Labels: patch > Attachments: inconsistant.patch > > > We had a PDF that had a strange page with 200Mb+ of text to extract and the > deflate function did not work correctly. > This created a fatal in PDFBox and I did some debugging and noticed that we > handle SetNonStrokingColorSpace and SetStrokingColorSpace in different ways. > One of them had a check if the in data was incorrect and returned and the > other one did not have this check. > I made this small patch that I will include in this issue to rectify this > inconsistency. > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4743) Long rendering time of fonts in a specific PDF
[ https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018294#comment-17018294 ] Daniel Persson commented on PDFBOX-4743: Hi Tilman. I have added two new images. One without the instructions to set a font and write text. And the other one without drawing instructions for images. Without images is slow, without text is fast. Best regards Daniel > Long rendering time of fonts in a specific PDF > -- > > Key: PDFBOX-4743 > URL: https://issues.apache.org/jira/browse/PDFBOX-4743 > Project: PDFBox > Issue Type: Improvement > Environment: Gentoo Linux, Java 8 >Reporter: Daniel Persson >Priority: Minor > Attachments: slow_rendering.pdf, without_images.pdf, without_text.pdf > > > Hi Team. > > We have found a PDF that takes a long time to render images. > > After some checking, we found that the one page takes more than 2 minutes to > render, but if we remove the font information and render the PDF without > text, it takes 3 seconds. > > Just looking at the font information, it doesn't seem to be a lot of data. > 3-5kb per font and there are only about seven fonts defined. So there must be > something else that complicates things. > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4743) Long rendering time of fonts in a specific PDF
[ https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018294#comment-17018294 ] Daniel Persson edited comment on PDFBOX-4743 at 1/17/20 8:35 PM: - Hi Tilman. I have added two new PDFs. One without the instructions to set a font and write text. And the other one without drawing instructions for images. Without images is slow, without text is fast. Best regards Daniel was (Author: kalaspuffar): Hi Tilman. I have added two new images. One without the instructions to set a font and write text. And the other one without drawing instructions for images. Without images is slow, without text is fast. Best regards Daniel > Long rendering time of fonts in a specific PDF > -- > > Key: PDFBOX-4743 > URL: https://issues.apache.org/jira/browse/PDFBOX-4743 > Project: PDFBox > Issue Type: Improvement > Environment: Gentoo Linux, Java 8 >Reporter: Daniel Persson >Priority: Minor > Attachments: slow_rendering.pdf, without_images.pdf, without_text.pdf > > > Hi Team. > > We have found a PDF that takes a long time to render images. > > After some checking, we found that the one page takes more than 2 minutes to > render, but if we remove the font information and render the PDF without > text, it takes 3 seconds. > > Just looking at the font information, it doesn't seem to be a lot of data. > 3-5kb per font and there are only about seven fonts defined. So there must be > something else that complicates things. > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4743) Long rendering time of fonts in a specific PDF
[ https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4743: --- Attachment: without_text.pdf without_images.pdf > Long rendering time of fonts in a specific PDF > -- > > Key: PDFBOX-4743 > URL: https://issues.apache.org/jira/browse/PDFBOX-4743 > Project: PDFBox > Issue Type: Improvement > Environment: Gentoo Linux, Java 8 >Reporter: Daniel Persson >Priority: Minor > Attachments: slow_rendering.pdf, without_images.pdf, without_text.pdf > > > Hi Team. > > We have found a PDF that takes a long time to render images. > > After some checking, we found that the one page takes more than 2 minutes to > render, but if we remove the font information and render the PDF without > text, it takes 3 seconds. > > Just looking at the font information, it doesn't seem to be a lot of data. > 3-5kb per font and there are only about seven fonts defined. So there must be > something else that complicates things. > > Best regards > Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4743) Long rendering time of fonts in a specific PDF
Daniel Persson created PDFBOX-4743: -- Summary: Long rendering time of fonts in a specific PDF Key: PDFBOX-4743 URL: https://issues.apache.org/jira/browse/PDFBOX-4743 Project: PDFBox Issue Type: Improvement Environment: Gentoo Linux, Java 8 Reporter: Daniel Persson Attachments: slow_rendering.pdf Hi Team. We have found a PDF that takes a long time to render images. After some checking, we found that the one page takes more than 2 minutes to render, but if we remove the font information and render the PDF without text, it takes 3 seconds. Just looking at the font information, it doesn't seem to be a lot of data. 3-5kb per font and there are only about seven fonts defined. So there must be something else that complicates things. Best regards Daniel -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4501) References numbers in embedded PDF become floats
Daniel Persson created PDFBOX-4501: -- Summary: References numbers in embedded PDF become floats Key: PDFBOX-4501 URL: https://issues.apache.org/jira/browse/PDFBOX-4501 Project: PDFBox Issue Type: Bug Reporter: Daniel Persson Attachments: float_pointer.patch Hi everyone. We found an issue that happens sometimes with smaller producers that create PDF files with embedded advertisements or other articles. For some reason, this embedded makes the library to throw an exception and not read the file. In many cases, we can read most of the pages but just these embedded data will be missing. I wrote a little patch that will handle the issue but I don't know how to decode the embedded data so I have not debugged the issue further. I will add a link to the file because it's 124 Mb so not allowed to upload with the issue. [https://drive.google.com/file/d/1hQslqtrbIoo5bTmMXgH1NDSYXuvIUOAQ/view?usp=sharing] If we could find a solution where the PDF could be read correctly that would be great but the current behavior of not reading it at all is not great. ``` java.io.IOException: expected number, actual=COSFloat\{18446744073221199360} at offset 127766191 org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:166) org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279) org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212) org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864) org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912) org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:761) org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187) org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069) org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007) org.apache.pdfbox.debugger.PDFDebugger$12.open(PDFDebugger.java:1272) org.apache.pdfbox.debugger.PDFDebugger$DocumentOpener.parse(PDFDebugger.java:1383) org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1275) org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1252) org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1243) ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4306) Image clipping area rounding error
[ https://issues.apache.org/jira/browse/PDFBOX-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617769#comment-16617769 ] Daniel Persson commented on PDFBOX-4306: Hi @tilman Well, after some consideration and no other response to this issue I would ask you to include the last patch in the next release if possible. Best regards Daniel > Image clipping area rounding error > -- > > Key: PDFBOX-4306 > URL: https://issues.apache.org/jira/browse/PDFBOX-4306 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Major > Labels: rendering > Attachments: page-1.pdf, page-2.pdf, patch.diff, patch2.diff, test.jpg > > > Creating images with PDFBox and merging them together when you have two pages > that connect will create a white line between the images. > We have looked into the issue and tried to fix it and found that the clipping > area is a bit to tight so the images will not be rendered correctly. My guess > is that this is due to a rounding error when using floats. > Most of the graphics functions in java use double precision and PDFBox uses > floats so when using layer upon layer of bounding boxes intersecting the > clipping area it might get skewed to a bad bounding box. > I've added a patch to this issue with the code we use as a workaround today. > It's by no means the final solution to the problem but it resolves the white > line issue. > To be sure that you get the error when generating the images use the > following command > ``` > java -jar pdfbox-app-3.0.0-SNAPSHOT.jar PDFToImage -dpi 150 -quality 0.95 > -format jpg page-1.pdf > ``` > We run java 8 on our machines. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4306) Image clipping area rounding error
[ https://issues.apache.org/jira/browse/PDFBOX-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599636#comment-16599636 ] Daniel Persson commented on PDFBOX-4306: Well, if it's a rounding error when drawing elements then I feel that you should ensure to increase the precision instead of changing the image ratio. With this solution we create images that are 1 pixel narrower than the same produced by Poppler. Then again that might not be the correct resolution. The thing that worries me is that when you create a spread of multiple images you could introduce a jagged edge between the images if one pixel is missing. Best regards Daniel > Image clipping area rounding error > -- > > Key: PDFBOX-4306 > URL: https://issues.apache.org/jira/browse/PDFBOX-4306 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Major > Labels: rendering > Attachments: page-1.pdf, page-2.pdf, patch.diff, patch2.diff, test.jpg > > > Creating images with PDFBox and merging them together when you have two pages > that connect will create a white line between the images. > We have looked into the issue and tried to fix it and found that the clipping > area is a bit to tight so the images will not be rendered correctly. My guess > is that this is due to a rounding error when using floats. > Most of the graphics functions in java use double precision and PDFBox uses > floats so when using layer upon layer of bounding boxes intersecting the > clipping area it might get skewed to a bad bounding box. > I've added a patch to this issue with the code we use as a workaround today. > It's by no means the final solution to the problem but it resolves the white > line issue. > To be sure that you get the error when generating the images use the > following command > ``` > java -jar pdfbox-app-3.0.0-SNAPSHOT.jar PDFToImage -dpi 150 -quality 0.95 > -format jpg page-1.pdf > ``` > We run java 8 on our machines. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4306) Image clipping area rounding error
[ https://issues.apache.org/jira/browse/PDFBOX-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598736#comment-16598736 ] Daniel Persson commented on PDFBOX-4306: Hi again. We noticed that the result of smaller images in other pages created artifacts so we realized that patch.diff was not a solution we could use going forward. Patch2.diff is a hackier solution but seems to solve the immediate problem for us at least but there must be a better way. best regards Daniel > Image clipping area rounding error > -- > > Key: PDFBOX-4306 > URL: https://issues.apache.org/jira/browse/PDFBOX-4306 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Major > Labels: rendering > Attachments: page-1.pdf, page-2.pdf, patch.diff, patch2.diff, test.jpg > > > Creating images with PDFBox and merging them together when you have two pages > that connect will create a white line between the images. > We have looked into the issue and tried to fix it and found that the clipping > area is a bit to tight so the images will not be rendered correctly. My guess > is that this is due to a rounding error when using floats. > Most of the graphics functions in java use double precision and PDFBox uses > floats so when using layer upon layer of bounding boxes intersecting the > clipping area it might get skewed to a bad bounding box. > I've added a patch to this issue with the code we use as a workaround today. > It's by no means the final solution to the problem but it resolves the white > line issue. > To be sure that you get the error when generating the images use the > following command > ``` > java -jar pdfbox-app-3.0.0-SNAPSHOT.jar PDFToImage -dpi 150 -quality 0.95 > -format jpg page-1.pdf > ``` > We run java 8 on our machines. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4306) Image clipping area rounding error
[ https://issues.apache.org/jira/browse/PDFBOX-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4306: --- Attachment: patch2.diff > Image clipping area rounding error > -- > > Key: PDFBOX-4306 > URL: https://issues.apache.org/jira/browse/PDFBOX-4306 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Major > Labels: rendering > Attachments: page-1.pdf, page-2.pdf, patch.diff, patch2.diff, test.jpg > > > Creating images with PDFBox and merging them together when you have two pages > that connect will create a white line between the images. > We have looked into the issue and tried to fix it and found that the clipping > area is a bit to tight so the images will not be rendered correctly. My guess > is that this is due to a rounding error when using floats. > Most of the graphics functions in java use double precision and PDFBox uses > floats so when using layer upon layer of bounding boxes intersecting the > clipping area it might get skewed to a bad bounding box. > I've added a patch to this issue with the code we use as a workaround today. > It's by no means the final solution to the problem but it resolves the white > line issue. > To be sure that you get the error when generating the images use the > following command > ``` > java -jar pdfbox-app-3.0.0-SNAPSHOT.jar PDFToImage -dpi 150 -quality 0.95 > -format jpg page-1.pdf > ``` > We run java 8 on our machines. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4306) Image clipping area rounding error
Daniel Persson created PDFBOX-4306: -- Summary: Image clipping area rounding error Key: PDFBOX-4306 URL: https://issues.apache.org/jira/browse/PDFBOX-4306 Project: PDFBox Issue Type: Bug Reporter: Daniel Persson Attachments: page-1.pdf, page-2.pdf, patch.diff, test.jpg Creating images with PDFBox and merging them together when you have two pages that connect will create a white line between the images. We have looked into the issue and tried to fix it and found that the clipping area is a bit to tight so the images will not be rendered correctly. My guess is that this is due to a rounding error when using floats. Most of the graphics functions in java use double precision and PDFBox uses floats so when using layer upon layer of bounding boxes intersecting the clipping area it might get skewed to a bad bounding box. I've added a patch to this issue with the code we use as a workaround today. It's by no means the final solution to the problem but it resolves the white line issue. To be sure that you get the error when generating the images use the following command ``` java -jar pdfbox-app-3.0.0-SNAPSHOT.jar PDFToImage -dpi 150 -quality 0.95 -format jpg page-1.pdf ``` We run java 8 on our machines. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4296) Question: Performance
Daniel Persson created PDFBOX-4296: -- Summary: Question: Performance Key: PDFBOX-4296 URL: https://issues.apache.org/jira/browse/PDFBOX-4296 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.11 Reporter: Daniel Persson Hi Team. We use a tool we built using PDFBox to extract text for about 10k pages per day. Then we have another tool to extract images using Poppler. We want to use PDFBox for both tasks but sadly we see a performance hit using PDFBox in the order of 3 times. Do you have any backlog / technical dept / ideas on how to improve performance? We have tried -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true and that made image generation much slower. We have set System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider") in code. We use image libraries from twelvemonkeys, pdfbox and the standard jai project. I've read in the code that we do double writes for images using transparency which might be a culprit. I have been allowed to put some time into the project if we have some solid leads or a roadmap to reach better performance. Hope it's okay to track this issue here instead of a question on the mailing list. Best regards Daniel -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4228) PDFBox crashes when a Type3 font don't have an embedded encoding.
[ https://issues.apache.org/jira/browse/PDFBOX-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4228: --- Attachment: example.pdf > PDFBox crashes when a Type3 font don't have an embedded encoding. > - > > Key: PDFBOX-4228 > URL: https://issues.apache.org/jira/browse/PDFBOX-4228 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Critical > Labels: patch > Attachments: example.pdf, type3_fixed.patch > > > When running PDFBox on a pdf with WinAnsiEncoding for a Type3 font it crashes > without any output. > {code:java} > Exception in thread "main" java.lang.ClassCastException: > org.apache.pdfbox.cos.COSName cannot be cast to > org.apache.pdfbox.cos.COSDictionary > at > org.apache.pdfbox.pdmodel.font.PDType3Font.readEncoding(PDType3Font.java:82) > at org.apache.pdfbox.pdmodel.font.PDType3Font.(PDType3Font.java:66) > at > org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:79) > at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143) > at > org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181) > at > org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181) > at > org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) > at > org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:141) > at > org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:360) > at > org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:288) > at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235) > at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237) > at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:59) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4228) PDFBox crashes when a Type3 font don't have an embedded encoding.
[ https://issues.apache.org/jira/browse/PDFBOX-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4228: --- Attachment: type3_fixed.patch > PDFBox crashes when a Type3 font don't have an embedded encoding. > - > > Key: PDFBOX-4228 > URL: https://issues.apache.org/jira/browse/PDFBOX-4228 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Critical > Labels: patch > Attachments: example.pdf, type3_fixed.patch > > > When running PDFBox on a pdf with WinAnsiEncoding for a Type3 font it crashes > without any output. > {code:java} > Exception in thread "main" java.lang.ClassCastException: > org.apache.pdfbox.cos.COSName cannot be cast to > org.apache.pdfbox.cos.COSDictionary > at > org.apache.pdfbox.pdmodel.font.PDType3Font.readEncoding(PDType3Font.java:82) > at org.apache.pdfbox.pdmodel.font.PDType3Font.(PDType3Font.java:66) > at > org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:79) > at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143) > at > org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181) > at > org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181) > at > org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) > at > org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:141) > at > org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:360) > at > org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:288) > at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235) > at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237) > at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:59) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4228) PDFBox crashes when a Type3 font don't have an embedded encoding.
Daniel Persson created PDFBOX-4228: -- Summary: PDFBox crashes when a Type3 font don't have an embedded encoding. Key: PDFBOX-4228 URL: https://issues.apache.org/jira/browse/PDFBOX-4228 Project: PDFBox Issue Type: Bug Reporter: Daniel Persson When running PDFBox on a pdf with WinAnsiEncoding for a Type3 font it crashes without any output. {code:java} Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSName cannot be cast to org.apache.pdfbox.cos.COSDictionary at org.apache.pdfbox.pdmodel.font.PDType3Font.readEncoding(PDType3Font.java:82) at org.apache.pdfbox.pdmodel.font.PDType3Font.(PDType3Font.java:66) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:79) at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143) at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472) at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181) at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472) at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181) at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:141) at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:360) at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:288) at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235) at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237) at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:59) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4140) Crash when repeating flag is outside of range.
[ https://issues.apache.org/jira/browse/PDFBOX-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4140: --- Attachment: LP-180302-08.pdf > Crash when repeating flag is outside of range. > -- > > Key: PDFBOX-4140 > URL: https://issues.apache.org/jira/browse/PDFBOX-4140 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.8 >Reporter: Daniel Persson >Priority: Major > Labels: patch > Attachments: LP-180302-08.pdf, fixing_broken_pdf.diff > > > Running PDFBox to create images with a PDF with bad data the tool crashes and > no image is rendered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4140) Crash when repeating flag is outside of range.
[ https://issues.apache.org/jira/browse/PDFBOX-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-4140: --- Attachment: fixing_broken_pdf.diff > Crash when repeating flag is outside of range. > -- > > Key: PDFBOX-4140 > URL: https://issues.apache.org/jira/browse/PDFBOX-4140 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.8 >Reporter: Daniel Persson >Priority: Major > Labels: patch > Attachments: fixing_broken_pdf.diff > > > Running PDFBox to create images with a PDF with bad data the tool crashes and > no image is rendered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4140) Crash when repeating flag is outside of range.
Daniel Persson created PDFBOX-4140: -- Summary: Crash when repeating flag is outside of range. Key: PDFBOX-4140 URL: https://issues.apache.org/jira/browse/PDFBOX-4140 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.8 Reporter: Daniel Persson Attachments: fixing_broken_pdf.diff Running PDFBox to create images with a PDF with bad data the tool crashes and no image is rendered. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4021) Building from source missing dependency
Daniel Persson created PDFBOX-4021: -- Summary: Building from source missing dependency Key: PDFBOX-4021 URL: https://issues.apache.org/jira/browse/PDFBOX-4021 Project: PDFBox Issue Type: Improvement Components: Documentation Reporter: Daniel Persson Priority: Minor Downloaded and built trunk from source today and got a failing test due to missing Noto font. ``` 2017-11-23 08:19:58 ERROR org.apache.pdfbox.pdmodel.font.FileSystemFontProvider:661 - Could not load font file: /usr/share/fonts/noto/NotoSansCoptic-Regular.ttf java.io.FileNotFoundException: /usr/share/fonts/noto/NotoSansCoptic-Regular.ttf (No such file or directory) at java.io.RandomAccessFile.open0(Native Method) at java.io.RandomAccessFile.open(RandomAccessFile.java:316) at java.io.RandomAccessFile.(RandomAccessFile.java:243) at org.apache.fontbox.ttf.BufferedRandomAccessFile.(BufferedRandomAccessFile.java:88) at org.apache.fontbox.ttf.RAFDataStream.(RAFDataStream.java:63) at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:84) at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.readTrueTypeFont(FileSystemFontProvider.java:682) at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.getTrueTypeFont(FileSystemFontProvider.java:650) at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.access$200(FileSystemFontProvider.java:55) at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo.getFont(FileSystemFontProvider.java:126) at org.apache.pdfbox.pdmodel.font.FontMapperImpl.getCIDFont(FontMapperImpl.java:518) at org.apache.pdfbox.pdmodel.font.PDCIDFontType0.(PDCIDFontType0.java:128) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:121) at org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:80) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.getFonts(ResourcesValidationProcess.java:125) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:94) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:77) at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84) at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57) at org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess.validateResources(SinglePageValidationProcess.java:169) at org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess.validate(SinglePageValidationProcess.java:84) at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84) at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57) at org.apache.pdfbox.preflight.process.PageTreeValidationProcess.validatePage(PageTreeValidationProcess.java:69) at org.apache.pdfbox.preflight.process.PageTreeValidationProcess.validate(PageTreeValidationProcess.java:57) at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84) at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:122) at org.apache.pdfbox.preflight.PreflightDocument.validate(PreflightDocument.java:163) at org.apache.pdfbox.preflight.TestIsartorBavaria.validate(TestIsartorBavaria.java:190) ``` ``` validate[target/pdfs/Isartor testsuite/PDFA-1b/6.3 Fonts/6.3.4 Embedded font programs/isartor-6-3-4-t01-fail-c.pdf](org.apache.pdfbox.preflight.TestIsartorBavaria) Time elapsed: 0.025 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.pdfbox.pdmodel.font.PDCIDFontType0.(PDCIDFontType0.java:158) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:121) at org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:80) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.getFonts(ResourcesValidationProcess.java:125) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:94) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:77) at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84) at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57) at
[jira] [Commented] (PDFBOX-3806) Nullpointer exception in getLeftSideBearing
[ https://issues.apache.org/jira/browse/PDFBOX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026203#comment-16026203 ] Daniel Persson commented on PDFBOX-3806: I've tested with our application with installed trunk (3.0.0-SNAPSHOT), pdfbox-app (2.0.7-SNAPSHOT) and pdfbox-debugger (2.0.7 SNAPSHOT). Don't see any errors. Best regards Daniel > Nullpointer exception in getLeftSideBearing > --- > > Key: PDFBOX-3806 > URL: https://issues.apache.org/jira/browse/PDFBOX-3806 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.6 >Reporter: Daniel Persson >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.7, 3.0.0 > > Attachments: font.raw > > > While processing todays batch of data we got a Nullpointer exception in > getLeftSideBearing. Sadly I can't give you the PDF. > ``` > public int getLeftSideBearing(int gid) { > return gid < > this.numHMetrics?this.leftSideBearing[gid]:this.nonHorizontalLeftSideBearing[gid > - this.numHMetrics]; > } > ``` > In this function there could be a case where nonHorizontalLeftSideBearing is > null and you still ask for a GID in larger or equal to numHMetrics. > First time I see this issue and so far only 4 characters in one PDF has this > issue so not critical. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3806) Nullpointer exception in getLeftSideBearing
[ https://issues.apache.org/jira/browse/PDFBOX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-3806: --- Attachment: font.raw > Nullpointer exception in getLeftSideBearing > --- > > Key: PDFBOX-3806 > URL: https://issues.apache.org/jira/browse/PDFBOX-3806 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Minor > Attachments: font.raw > > > While processing todays batch of data we got a Nullpointer exception in > getLeftSideBearing. Sadly I can't give you the PDF. > ``` > public int getLeftSideBearing(int gid) { > return gid < > this.numHMetrics?this.leftSideBearing[gid]:this.nonHorizontalLeftSideBearing[gid > - this.numHMetrics]; > } > ``` > In this function there could be a case where nonHorizontalLeftSideBearing is > null and you still ask for a GID in larger or equal to numHMetrics. > First time I see this issue and so far only 4 characters in one PDF has this > issue so not critical. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3806) Nullpointer exception in getLeftSideBearing
[ https://issues.apache.org/jira/browse/PDFBOX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025846#comment-16025846 ] Daniel Persson commented on PDFBOX-3806: When it comes to the code seems like I cut from the disassembled code in IntelliJ. public int getLeftSideBearing(int gid) { if (gid < numHMetrics) { return leftSideBearing[gid]; } else { return nonHorizontalLeftSideBearing[gid - numHMetrics]; } } > Nullpointer exception in getLeftSideBearing > --- > > Key: PDFBOX-3806 > URL: https://issues.apache.org/jira/browse/PDFBOX-3806 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Minor > > While processing todays batch of data we got a Nullpointer exception in > getLeftSideBearing. Sadly I can't give you the PDF. > ``` > public int getLeftSideBearing(int gid) { > return gid < > this.numHMetrics?this.leftSideBearing[gid]:this.nonHorizontalLeftSideBearing[gid > - this.numHMetrics]; > } > ``` > In this function there could be a case where nonHorizontalLeftSideBearing is > null and you still ask for a GID in larger or equal to numHMetrics. > First time I see this issue and so far only 4 characters in one PDF has this > issue so not critical. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3806) Nullpointer exception in getLeftSideBearing
[ https://issues.apache.org/jira/browse/PDFBOX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025845#comment-16025845 ] Daniel Persson commented on PDFBOX-3806: java.lang.NullPointerException at org.apache.fontbox.ttf.HorizontalMetricsTable.getLeftSideBearing(HorizontalMetricsTable.java:122) at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:195) at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:176) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getPath(PDTrueTypeFont.java:447) at org.apache.pdfbox.debugger.fontencodingpane.SimpleFont.getGlyphs(SimpleFont.java:72) at org.apache.pdfbox.debugger.fontencodingpane.SimpleFont.(SimpleFont.java:44) at org.apache.pdfbox.debugger.fontencodingpane.FontEncodingPaneController.(FontEncodingPaneController.java:89) at org.apache.pdfbox.debugger.PDFDebugger.showFont(PDFDebugger.java:1069) at org.apache.pdfbox.debugger.PDFDebugger.jTree1ValueChanged(PDFDebugger.java:801) at org.apache.pdfbox.debugger.PDFDebugger.access$200(PDFDebugger.java:118) at org.apache.pdfbox.debugger.PDFDebugger$3.valueChanged(PDFDebugger.java:330) at javax.swing.JTree.fireValueChanged(JTree.java:2927) at javax.swing.JTree$TreeSelectionRedirector.valueChanged(JTree.java:3391) at javax.swing.tree.DefaultTreeSelectionModel.fireValueChanged(DefaultTreeSelectionModel.java:635) at javax.swing.tree.DefaultTreeSelectionModel.notifyPathChange(DefaultTreeSelectionModel.java:1093) at javax.swing.tree.DefaultTreeSelectionModel.setSelectionPaths(DefaultTreeSelectionModel.java:294) at javax.swing.tree.DefaultTreeSelectionModel.setSelectionPath(DefaultTreeSelectionModel.java:188) at javax.swing.JTree.setSelectionPath(JTree.java:1634) at javax.swing.plaf.basic.BasicTreeUI.selectPathForEvent(BasicTreeUI.java:2393) at javax.swing.plaf.basic.BasicTreeUI$Handler.handleSelection(BasicTreeUI.java:3609) at javax.swing.plaf.basic.BasicTreeUI$Handler.mousePressed(BasicTreeUI.java:3548) at java.awt.Component.processMouseEvent(Component.java:6530) at javax.swing.JComponent.processMouseEvent(JComponent.java:3324) at java.awt.Component.processEvent(Component.java:6298) at java.awt.Container.processEvent(Container.java:2236) at java.awt.Component.dispatchEventImpl(Component.java:4889) at java.awt.Container.dispatchEventImpl(Container.java:2294) at java.awt.Component.dispatchEvent(Component.java:4711) at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888) at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4522) at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466) at java.awt.Container.dispatchEventImpl(Container.java:2280) at java.awt.Window.dispatchEventImpl(Window.java:2746) at java.awt.Component.dispatchEvent(Component.java:4711) at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758) at java.awt.EventQueue.access$500(EventQueue.java:97) at java.awt.EventQueue$3.run(EventQueue.java:709) at java.awt.EventQueue$3.run(EventQueue.java:703) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:80) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:90) at java.awt.EventQueue$4.run(EventQueue.java:731) at java.awt.EventQueue$4.run(EventQueue.java:729) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:80) at java.awt.EventQueue.dispatchEvent(EventQueue.java:728) at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) at java.awt.EventDispatchThread.run(EventDispatchThread.java:82) > Nullpointer exception in getLeftSideBearing > --- > > Key: PDFBOX-3806 > URL: https://issues.apache.org/jira/browse/PDFBOX-3806 > Project: PDFBox > Issue Type: Bug >Reporter: Daniel Persson >Priority: Minor > > While processing todays batch of data we got a Nullpointer exception in > getLeftSideBearing. Sadly I can't
[jira] [Created] (PDFBOX-3806) Nullpointer exception in getLeftSideBearing
Daniel Persson created PDFBOX-3806: -- Summary: Nullpointer exception in getLeftSideBearing Key: PDFBOX-3806 URL: https://issues.apache.org/jira/browse/PDFBOX-3806 Project: PDFBox Issue Type: Bug Reporter: Daniel Persson Priority: Minor While processing todays batch of data we got a Nullpointer exception in getLeftSideBearing. Sadly I can't give you the PDF. ``` public int getLeftSideBearing(int gid) { return gid < this.numHMetrics?this.leftSideBearing[gid]:this.nonHorizontalLeftSideBearing[gid - this.numHMetrics]; } ``` In this function there could be a case where nonHorizontalLeftSideBearing is null and you still ask for a GID in larger or equal to numHMetrics. First time I see this issue and so far only 4 characters in one PDF has this issue so not critical. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3802) Images wrong color
Daniel Persson created PDFBOX-3802: -- Summary: Images wrong color Key: PDFBOX-3802 URL: https://issues.apache.org/jira/browse/PDFBOX-3802 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.6 Environment: Gentoo, Ubuntu Reporter: Daniel Persson Priority: Minor Attachments: pdfbox.png, poppler.png, test.pdf We found that some images in our pdf flow didn't have the correct colors after extraction. After some investigation it seemed that we had the same problem with both poppler and pdfbox. We found a solution for poppler where we recompiled it with version 2.8 of Little CMS. The images in this issue was created with these commands: ``` java -jar pdfbox-app-2.1.0-SNAPSHOT.jar PDFToImage -imageType png test.pdf ``` ``` pdftoppm test.pdf -png poppler ``` Best regards Daniel -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-3764) 100 times performance hit on creating images
[ https://issues.apache.org/jira/browse/PDFBOX-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson resolved PDFBOX-3764. Resolution: Invalid > 100 times performance hit on creating images > > > Key: PDFBOX-3764 > URL: https://issues.apache.org/jira/browse/PDFBOX-3764 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.6 >Reporter: Daniel Persson > Labels: image, performance > Attachments: callstack_1.png, callstack_2.png, test.pdf > > > We found that PDFBox creates a better image than poppler so we wanted to > switch out our environment to get these improvements but found a file that > took about 10 minutes to create one image with PDFBox and only about 6 > seconds with poppler. So a 100 times performance hit if we where to change. > I've done some rudimentary profiling on the code and found that most of the > time is spent in ColorConvertOp.filter. Maybe there is a leaner way to > implement this in order to get a better result? > best regards > Daniel -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3764) 100 times performance hit on creating images
[ https://issues.apache.org/jira/browse/PDFBOX-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980990#comment-15980990 ] Daniel Persson commented on PDFBOX-3764: Hi Tilman. Thank you for the support. Sadly I've not read the getting started guide at https://pdfbox.apache.org/2.0/getting-started.html Been using PDFBox for reading text for years now so I've must missed this update. Now we're down to 19 seconds rendering instead of 10 minutes. :) And after I added the org.apache.pdfbox.rendering.UsePureJavaCMYKConversion the thing taking the most time is the InputStream.read function which seems resonable. Thank you for the quick response. Best regards Daniel > 100 times performance hit on creating images > > > Key: PDFBOX-3764 > URL: https://issues.apache.org/jira/browse/PDFBOX-3764 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.6 >Reporter: Daniel Persson > Labels: image, performance > Attachments: callstack_1.png, callstack_2.png, test.pdf > > > We found that PDFBox creates a better image than poppler so we wanted to > switch out our environment to get these improvements but found a file that > took about 10 minutes to create one image with PDFBox and only about 6 > seconds with poppler. So a 100 times performance hit if we where to change. > I've done some rudimentary profiling on the code and found that most of the > time is spent in ColorConvertOp.filter. Maybe there is a leaner way to > implement this in order to get a better result? > best regards > Daniel -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3764) 100 times performance hit on creating images
[ https://issues.apache.org/jira/browse/PDFBOX-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-3764: --- Component/s: Rendering > 100 times performance hit on creating images > > > Key: PDFBOX-3764 > URL: https://issues.apache.org/jira/browse/PDFBOX-3764 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.6 >Reporter: Daniel Persson > Labels: image, performance > Attachments: callstack_1.png, callstack_2.png, test.pdf > > > We found that PDFBox creates a better image than poppler so we wanted to > switch out our environment to get these improvements but found a file that > took about 10 minutes to create one image with PDFBox and only about 6 > seconds with poppler. So a 100 times performance hit if we where to change. > I've done some rudimentary profiling on the code and found that most of the > time is spent in ColorConvertOp.filter. Maybe there is a leaner way to > implement this in order to get a better result? > best regards > Daniel -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3764) 100 times performance hit on creating images
[ https://issues.apache.org/jira/browse/PDFBOX-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-3764: --- Affects Version/s: 2.0.6 > 100 times performance hit on creating images > > > Key: PDFBOX-3764 > URL: https://issues.apache.org/jira/browse/PDFBOX-3764 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.6 >Reporter: Daniel Persson > Labels: image, performance > Attachments: callstack_1.png, callstack_2.png, test.pdf > > > We found that PDFBox creates a better image than poppler so we wanted to > switch out our environment to get these improvements but found a file that > took about 10 minutes to create one image with PDFBox and only about 6 > seconds with poppler. So a 100 times performance hit if we where to change. > I've done some rudimentary profiling on the code and found that most of the > time is spent in ColorConvertOp.filter. Maybe there is a leaner way to > implement this in order to get a better result? > best regards > Daniel -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3764) 100 times performance hit on creating images
Daniel Persson created PDFBOX-3764: -- Summary: 100 times performance hit on creating images Key: PDFBOX-3764 URL: https://issues.apache.org/jira/browse/PDFBOX-3764 Project: PDFBox Issue Type: Improvement Reporter: Daniel Persson Attachments: callstack_1.png, callstack_2.png, test.pdf We found that PDFBox creates a better image than poppler so we wanted to switch out our environment to get these improvements but found a file that took about 10 minutes to create one image with PDFBox and only about 6 seconds with poppler. So a 100 times performance hit if we where to change. I've done some rudimentary profiling on the code and found that most of the time is spent in ColorConvertOp.filter. Maybe there is a leaner way to implement this in order to get a better result? best regards Daniel -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3724) Wrong size in rendering of some artifacts
Daniel Persson created PDFBOX-3724: -- Summary: Wrong size in rendering of some artifacts Key: PDFBOX-3724 URL: https://issues.apache.org/jira/browse/PDFBOX-3724 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.5 Reporter: Daniel Persson Priority: Minor Attachments: example1.pdf, example1-pdfbox1.png, example1-poppler-1.png Seems that some artifacts get the wrong width when rendering. I've tested my way to that the artifact is a stroked line and it seems the stroke width is larger than a single pixel and stroke width might only be applied to how wide a stroke is and the length of the stroke might have a minimal length? Poppler seem to handle this stroke correctly. - OFF TOPIC We do text extraction with PDFBox and use poppler today to extract our images because we had a lot of artifacts earlier but with the tremendous work by the team to solve PDFBOX-3000 issues we are looking into using PDFBox for image rendering. A lot of our examples have even more details than the poppler rendered images. Great work people. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3511) NullPointerException - missing glyph description
Daniel Persson created PDFBOX-3511: -- Summary: NullPointerException - missing glyph description Key: PDFBOX-3511 URL: https://issues.apache.org/jira/browse/PDFBOX-3511 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.3, 2.0.2, 2.0.1, 2.0.0 Reporter: Daniel Persson Priority: Minor Hi Team. We process many PDF documents every day and today we ran into a file that we couldn't create an image to. For some reason it has glyphs that didn't have any glyph description. In GlyfCompositeDescript there is atleast two functions (Line 258, 271) that fetch an GlyphDescription from a map like this: GlyphDescription gd = descriptions.get(c.getGlyphIndex()); Then the functions use the description without a null check which results in an NullPointer exception. Exception in thread "main" java.lang.NullPointerException at org.apache.fontbox.ttf.GlyfCompositeDescript.getCompositeCompEndPt(GlyfCompositeDescript.java:272) at org.apache.fontbox.ttf.GlyfCompositeDescript.getEndPtOfContours(GlyfCompositeDescript.java:126) at org.apache.fontbox.ttf.GlyphRenderer.describe(GlyphRenderer.java:72) at org.apache.fontbox.ttf.GlyphRenderer.getPath(GlyphRenderer.java:56) at org.apache.fontbox.ttf.GlyphData.getPath(GlyphData.java:116) at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.getPath(PDCIDFontType2.java:446) at org.apache.pdfbox.pdmodel.font.PDType0Font.getPath(PDType0Font.java:506) at org.apache.pdfbox.rendering.TTFGlyph2D.getPathForGID(TTFGlyph2D.java:137) at org.apache.pdfbox.rendering.TTFGlyph2D.getPathForCharacterCode(TTFGlyph2D.java:93) at org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D(PageDrawer.java:353) at org.apache.pdfbox.rendering.PageDrawer.showFontGlyph(PageDrawer.java:334) at org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:744) at org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:701) at org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:564) at org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:55) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:815) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:472) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:446) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:189) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94) at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:236) at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:94) So far we have only seen one file with this issue in our processing. I've tried to run the PDFToImage with all versions of PDFBox 2 and they fail. PDFBox 1.8.12 gives some error output but generates an working image. Sep 23, 2016 7:36:53 AM org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation: BDC Sep 23, 2016 7:36:53 AM org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation: EMC Sep 23, 2016 7:36:55 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString WARNING: Changing font on <•> from to the default font Sep 23, 2016 7:36:55 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString WARNING: Changing font on <•> from to the default font Sep 23, 2016 7:36:55 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString WARNING: Changing font on <•> from to the default font Sep 23, 2016 7:36:55 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString WARNING: Changing font on <•> from to the default font Sep 23, 2016 7:36:55 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString WARNING: Changing font on <•> from to the default font Sep 23, 2016 7:36:55 AM org.apache.pdfbox.util.PDFImageWriter writeImage INFO: Writing: [Removed Identifer]_01_07_201609231.jpg At the time of writing the bug report the file is to fresh to disclose. Might be able to add it in a week or so depending on the customer, and if it's required for the resolution of this issue. Thanks for your time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3488) NullPointerException in PDTrueTypeFont.java if glyf table is missing
[ https://issues.apache.org/jira/browse/PDFBOX-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477561#comment-15477561 ] Daniel Persson commented on PDFBOX-3488: Hope this doesn't revert the solved issue PDFBOX-3395. Might have been a logical continuation from that fix. Maybe all fonts need a null pointer check when the table is missing, but an empty isn't a missing table. Looking forward to 2.0.3, going to solve a lot of our problems. Keep up the great work. > NullPointerException in PDTrueTypeFont.java if glyf table is missing > > > Key: PDFBOX-3488 > URL: https://issues.apache.org/jira/browse/PDFBOX-3488 > Project: PDFBox > Issue Type: Bug > Components: FontBox, Rendering >Affects Versions: 2.0.2, 2.0.3 >Reporter: Tilman Hausherr > > {code} > Caused by: java.lang.NullPointerException: null > > org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getPath(PDTrueTypeFont.java:444) > > org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getNormalizedPath(PDTrueTypeFont.java:502) > > org.apache.pdfbox.rendering.GlyphCache.getPathForCharacterCode(GlyphCache.java:71) > org.apache.pdfbox.rendering.PageDrawer.showFontGlyph(PageDrawer.java:350) > > org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:756) > > org.apache.pdfbox.debugger.pagepane.DebugPageDrawer.showGlyph(DebugPageDrawer.java:59) > > org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:713) > > org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:572) > > org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:55) > {code} > The cause is the change in PDFBOX-3395; previously PDFBox would consider the > font to be bad and replace it. Now we don't do that because the glyf table is > not always needed. > I'm throwing an exception for now but a better solution should be found. > Adobe Reader displays glyphs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3464) character height 3 times higher than expected
[ https://issues.apache.org/jira/browse/PDFBOX-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435380#comment-15435380 ] Daniel Persson commented on PDFBOX-3464: I also took a look into the supplied PDF and our tool using PDFBox will extract the correct height after normalizing the fonts. Both fonts have a EM square of 2048. > character height 3 times higher than expected > - > > Key: PDFBOX-3464 > URL: https://issues.apache.org/jira/browse/PDFBOX-3464 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Reporter: Roman >Priority: Minor > Attachments: notHelped.png, nowItsHelped.png, screenshot-1.png, > screenshot.png, subnode.docx.pdf > > > The issue basically same as PDFBOX-2749, but wrong sample was attached to it > by mistake. Correct PDF is attached here. > The core of the problem is that font height for this specific font is > determined incorrectly, please see code with comments below. > The issue was reproduced on Pdfbox 1.8.4, but as we tested before, same > result we get on 1.8.9 and 2.0 versions. > {code} > public class Extractor extends PDFTextStripper { > //<...CUT...> > protected void writePage() throws IOException { > for (List textList : charactersByArticle) { > //charactersByArticle was inherited from base class > Iterator textIter = textList.iterator(); > //<...CUT...> > while (textIter.hasNext()) { > TextPosition position = (TextPosition) > textIter.next(); > //<...CUT...> > PDFontDescriptor fontDescriptor = > position.getFont().getFontDescriptor(); > //<...CUT...> > float yscale = position.getTextPos().getYScale(); > float asc = Math.abs(fontDescriptor.getAscent() / 1000 * > yscale); > float rh = > Math.abs(fontDescriptor.getFontBoundingBox().getUpperRightY() / 1000 * > yscale); > float desc = Math.abs(fontDescriptor.getDescent() / 1000 * > yscale); > float capHeight = Math.abs(fontDescriptor.getCapHeight() / 1000 > * yscale); > if (capHeight == 0) > capHeight = position.getHeight(); > float h = (rh + Math.max(Math.max(capHeight, > position.getHeight()), asc)) / 2; > //"h" evaluates to 37.39 (should be between 11 and 12) > //"desc" evaluates to 2.664 > //"capHeight" evaluates to 37.39 > //"position.getHeight()" evaluates to 33.48 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3464) character height 3 times higher than expected
[ https://issues.apache.org/jira/browse/PDFBOX-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435348#comment-15435348 ] Daniel Persson commented on PDFBOX-3464: You might be right about that all fonts in PDFs should have an EM square of size 1000 but both Opentype and Truetype defines unitsPerEm in their head block and when applied to your calculations the actual height seems accurate. Opentype head https://www.microsoft.com/typography/otspec/head.htm TrueType head https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6head.html > character height 3 times higher than expected > - > > Key: PDFBOX-3464 > URL: https://issues.apache.org/jira/browse/PDFBOX-3464 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Reporter: Roman >Priority: Minor > Attachments: notHelped.png, nowItsHelped.png, screenshot-1.png, > screenshot.png, subnode.docx.pdf > > > The issue basically same as PDFBOX-2749, but wrong sample was attached to it > by mistake. Correct PDF is attached here. > The core of the problem is that font height for this specific font is > determined incorrectly, please see code with comments below. > The issue was reproduced on Pdfbox 1.8.4, but as we tested before, same > result we get on 1.8.9 and 2.0 versions. > {code} > public class Extractor extends PDFTextStripper { > //<...CUT...> > protected void writePage() throws IOException { > for (List textList : charactersByArticle) { > //charactersByArticle was inherited from base class > Iterator textIter = textList.iterator(); > //<...CUT...> > while (textIter.hasNext()) { > TextPosition position = (TextPosition) > textIter.next(); > //<...CUT...> > PDFontDescriptor fontDescriptor = > position.getFont().getFontDescriptor(); > //<...CUT...> > float yscale = position.getTextPos().getYScale(); > float asc = Math.abs(fontDescriptor.getAscent() / 1000 * > yscale); > float rh = > Math.abs(fontDescriptor.getFontBoundingBox().getUpperRightY() / 1000 * > yscale); > float desc = Math.abs(fontDescriptor.getDescent() / 1000 * > yscale); > float capHeight = Math.abs(fontDescriptor.getCapHeight() / 1000 > * yscale); > if (capHeight == 0) > capHeight = position.getHeight(); > float h = (rh + Math.max(Math.max(capHeight, > position.getHeight()), asc)) / 2; > //"h" evaluates to 37.39 (should be between 11 and 12) > //"desc" evaluates to 2.664 > //"capHeight" evaluates to 37.39 > //"position.getHeight()" evaluates to 33.48 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3464) character height 3 times higher than expected
[ https://issues.apache.org/jira/browse/PDFBOX-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434771#comment-15434771 ] Daniel Persson commented on PDFBOX-3464: Just a thought. Could it be because of the UPM square? "With the knowledge that your font is using a 1000, 1024, or 2048 UPM, you need to set up the drawing of your glyphs to ensure that all aspects of your typeface fit adequately into that UPM square." All values in your scaling is done with a UPM square of 1000 but this font might be using the 2048 square instead? > character height 3 times higher than expected > - > > Key: PDFBOX-3464 > URL: https://issues.apache.org/jira/browse/PDFBOX-3464 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Reporter: Roman >Priority: Minor > Attachments: notHelped.png, nowItsHelped.png, screenshot-1.png, > screenshot.png, subnode.docx.pdf > > > The issue basically same as PDFBOX-2749, but wrong sample was attached to it > by mistake. Correct PDF is attached here. > The core of the problem is that font height for this specific font is > determined incorrectly, please see code with comments below. > The issue was reproduced on Pdfbox 1.8.4, but as we tested before, same > result we get on 1.8.9 and 2.0 versions. > {code} > public class Extractor extends PDFTextStripper { > //<...CUT...> > protected void writePage() throws IOException { > for (List textList : charactersByArticle) { > //charactersByArticle was inherited from base class > Iterator textIter = textList.iterator(); > //<...CUT...> > while (textIter.hasNext()) { > TextPosition position = (TextPosition) > textIter.next(); > //<...CUT...> > PDFontDescriptor fontDescriptor = > position.getFont().getFontDescriptor(); > //<...CUT...> > float yscale = position.getTextPos().getYScale(); > float asc = Math.abs(fontDescriptor.getAscent() / 1000 * > yscale); > float rh = > Math.abs(fontDescriptor.getFontBoundingBox().getUpperRightY() / 1000 * > yscale); > float desc = Math.abs(fontDescriptor.getDescent() / 1000 * > yscale); > float capHeight = Math.abs(fontDescriptor.getCapHeight() / 1000 > * yscale); > if (capHeight == 0) > capHeight = position.getHeight(); > float h = (rh + Math.max(Math.max(capHeight, > position.getHeight()), asc)) / 2; > //"h" evaluates to 37.39 (should be between 11 and 12) > //"desc" evaluates to 2.664 > //"capHeight" evaluates to 37.39 > //"position.getHeight()" evaluates to 33.48 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3468) ERROR: dash lengths all zero, ignored
Daniel Persson created PDFBOX-3468: -- Summary: ERROR: dash lengths all zero, ignored Key: PDFBOX-3468 URL: https://issues.apache.org/jira/browse/PDFBOX-3468 Project: PDFBox Issue Type: Wish Components: Parsing Affects Versions: 2.0.2 Reporter: Daniel Persson Priority: Trivial On Friday our production log system alerted us that we had an error ("dash lengths all zero, ignored"). We investigated and found that the PDF processed gave an error when opening it up in Adobe Reader as well but the page looked fine and was processed fine as well. But still we got this error. For us this is a false positive, even though a line pattern should not be empty the page isn't broken or can't be viewed so why handle it as an error. My suggestion is to handle the errors in the code below as an information logging or warning. In our case we got an update 1 hour later with a PDF that didn't have the empty line dash pattern. {code:title=SetLineDashPattern.java|borderStyle=solid} for (COSBase base : dashArray) { if (base instanceof COSNumber) { COSNumber num = (COSNumber) base; if (num.floatValue() != 0) { allZero = false; break; } } else { LOG.error("dash array has non number element " + base + ", ignored"); dashArray = new COSArray(); break; } } if (dashArray.size() > 0 && allZero) { LOG.error("dash lengths all zero, ignored"); dashArray = new COSArray(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3353) Create appearance streams for annotations
[ https://issues.apache.org/jira/browse/PDFBOX-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390513#comment-15390513 ] Daniel Persson commented on PDFBOX-3353: Hi John Just had to comment on your last comment. The reasoning for not making a class inheritable is a solid on at first glans but might have consequences. When you make a class private / protected you lock down that class and those who need a quick fix could realize this and try to work around it. In the worst case someone might have to have a dummy subsystem to change one value that the author won't change for some reason. So using a third party library can be annoying for many reasons. A good API is extendable and open. If you get complaints when you bugfix that seems more like a community problem than a code problem. It's hard to measure the tone of text when English isn't your native language, but I hope you read my message as a reflection on your comment and not criticism. This community has made a great tool that I'm happy to use and contribute. Best regards Daniel > Create appearance streams for annotations > - > > Key: PDFBOX-3353 > URL: https://issues.apache.org/jira/browse/PDFBOX-3353 > Project: PDFBox > Issue Type: Task > Components: PDModel, Rendering >Affects Versions: 1.8.12, 2.0.0, 2.0.1, 2.0.2, 2.1.0 >Reporter: Tilman Hausherr > Labels: Annotations > Attachments: SquareAnnotations.pdf, showAnnotation.java > > > Create appearance streams for annotations when missing. > I'll start by replacing current code for Ink and Link annotations. > Good example PDFs: > http://www.pdfill.com/example/pdf_commenting_new.pdf > https://github.com/mozilla/pdf.js/issues/6810 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.
[ https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375639#comment-15375639 ] Daniel Persson commented on PDFBOX-3395: Ran some of my test cases and the errors are gone. Now I only have warnings for missing Unicode mappings which are unrelated to this issue. > Throwing exception when PDF has unused empty fonts embedded. > > > Key: PDFBOX-3395 > URL: https://issues.apache.org/jira/browse/PDFBOX-3395 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.1, 2.0.2, 2.0.3 >Reporter: Daniel Persson > Fix For: 2.0.3, 2.1.0 > > > I was trying to follow up on the issues in our system and found that some PDF > files threw ERRORs. These PDFs are produced by a publishing system and that > system seems to add fonts when you change to them and add them even though > they are never used. Or only space is used. Then they add this font with an > empty glyf table. This results in that errors are thrown on files that are > fine. > Line 310 in TTFParser removes empty glyf tables. > // skip tables with zero length > if (table.getLength() == 0) > { > return null; > } > return table; > Line 215 of TTFParser throws exception when glyf table is missing. > if (font.getGlyph() == null) > { > throw new IOException("glyf is mandatory"); > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.
[ https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375379#comment-15375379 ] Daniel Persson commented on PDFBOX-3395: True, if you read the glyph table you'll get an empty one. The problem is the line that generally skips tables of lenght 0 in the parser. Seems a bit odd. If the font have defined a table then the empty one should be a valid table right? > Throwing exception when PDF has unused empty fonts embedded. > > > Key: PDFBOX-3395 > URL: https://issues.apache.org/jira/browse/PDFBOX-3395 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Reporter: Daniel Persson > > I was trying to follow up on the issues in our system and found that some PDF > files threw ERRORs. These PDFs are produced by a publishing system and that > system seems to add fonts when you change to them and add them even though > they are never used. Or only space is used. Then they add this font with an > empty glyf table. This results in that errors are thrown on files that are > fine. > Line 310 in TTFParser removes empty glyf tables. > // skip tables with zero length > if (table.getLength() == 0) > { > return null; > } > return table; > Line 215 of TTFParser throws exception when glyf table is missing. > if (font.getGlyph() == null) > { > throw new IOException("glyf is mandatory"); > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.
[ https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374390#comment-15374390 ] Daniel Persson commented on PDFBOX-3395: Then again the specification doesn't say that a glyph table require any glyphs. So why should an empty generate a warning. A missing table, yes that is an error > Throwing exception when PDF has unused empty fonts embedded. > > > Key: PDFBOX-3395 > URL: https://issues.apache.org/jira/browse/PDFBOX-3395 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Reporter: Daniel Persson > > I was trying to follow up on the issues in our system and found that some PDF > files threw ERRORs. These PDFs are produced by a publishing system and that > system seems to add fonts when you change to them and add them even though > they are never used. Or only space is used. Then they add this font with an > empty glyf table. This results in that errors are thrown on files that are > fine. > Line 310 in TTFParser removes empty glyf tables. > // skip tables with zero length > if (table.getLength() == 0) > { > return null; > } > return table; > Line 215 of TTFParser throws exception when glyf table is missing. > if (font.getGlyph() == null) > { > throw new IOException("glyf is mandatory"); > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.
[ https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374385#comment-15374385 ] Daniel Persson commented on PDFBOX-3395: Correct, that's why I logged this as a minor wish issue. Our logging framework alerts us on errors and this isn't one so waking up to a false positive isn't preferable. > Throwing exception when PDF has unused empty fonts embedded. > > > Key: PDFBOX-3395 > URL: https://issues.apache.org/jira/browse/PDFBOX-3395 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Reporter: Daniel Persson > > I was trying to follow up on the issues in our system and found that some PDF > files threw ERRORs. These PDFs are produced by a publishing system and that > system seems to add fonts when you change to them and add them even though > they are never used. Or only space is used. Then they add this font with an > empty glyf table. This results in that errors are thrown on files that are > fine. > Line 310 in TTFParser removes empty glyf tables. > // skip tables with zero length > if (table.getLength() == 0) > { > return null; > } > return table; > Line 215 of TTFParser throws exception when glyf table is missing. > if (font.getGlyph() == null) > { > throw new IOException("glyf is mandatory"); > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.
[ https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372664#comment-15372664 ] Daniel Persson commented on PDFBOX-3395: Thanks for the heads up, not terrible important that it won't be indexed. After all it's just an ads page. And I guess you might want to use it for a test case later. > Throwing exception when PDF has unused empty fonts embedded. > > > Key: PDFBOX-3395 > URL: https://issues.apache.org/jira/browse/PDFBOX-3395 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Reporter: Daniel Persson > > I was trying to follow up on the issues in our system and found that some PDF > files threw ERRORs. These PDFs are produced by a publishing system and that > system seems to add fonts when you change to them and add them even though > they are never used. Or only space is used. Then they add this font with an > empty glyf table. This results in that errors are thrown on files that are > fine. > Line 310 in TTFParser removes empty glyf tables. > // skip tables with zero length > if (table.getLength() == 0) > { > return null; > } > return table; > Line 215 of TTFParser throws exception when glyf table is missing. > if (font.getGlyph() == null) > { > throw new IOException("glyf is mandatory"); > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.
[ https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-3395: --- Attachment: (was: commercal.pdf) > Throwing exception when PDF has unused empty fonts embedded. > > > Key: PDFBOX-3395 > URL: https://issues.apache.org/jira/browse/PDFBOX-3395 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Reporter: Daniel Persson > > I was trying to follow up on the issues in our system and found that some PDF > files threw ERRORs. These PDFs are produced by a publishing system and that > system seems to add fonts when you change to them and add them even though > they are never used. Or only space is used. Then they add this font with an > empty glyf table. This results in that errors are thrown on files that are > fine. > Line 310 in TTFParser removes empty glyf tables. > // skip tables with zero length > if (table.getLength() == 0) > { > return null; > } > return table; > Line 215 of TTFParser throws exception when glyf table is missing. > if (font.getGlyph() == null) > { > throw new IOException("glyf is mandatory"); > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.
Daniel Persson created PDFBOX-3395: -- Summary: Throwing exception when PDF has unused empty fonts embedded. Key: PDFBOX-3395 URL: https://issues.apache.org/jira/browse/PDFBOX-3395 Project: PDFBox Issue Type: Wish Reporter: Daniel Persson Priority: Minor I was trying to follow up on the issues in our system and found that some PDF files threw ERRORs. These PDFs are produced by a publishing system and that system seems to add fonts when you change to them and add them even though they are never used. Or only space is used. Then they add this font with an empty glyf table. This results in that errors are thrown on files that are fine. Line 310 in TTFParser removes empty glyf tables. // skip tables with zero length if (table.getLength() == 0) { return null; } return table; Line 215 of TTFParser throws exception when glyf table is missing. if (font.getGlyph() == null) { throw new IOException("glyf is mandatory"); } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3075) Changed to the getHeight function for fonts so it will return a more accurate height
[ https://issues.apache.org/jira/browse/PDFBOX-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983261#comment-14983261 ] Daniel Persson commented on PDFBOX-3075: Thanks for quick responses. I'll look into the issues on Monday on company time. Been a really great chat. > Changed to the getHeight function for fonts so it will return a more accurate > height > > > Key: PDFBOX-3075 > URL: https://issues.apache.org/jira/browse/PDFBOX-3075 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 >Reporter: Daniel Persson >Priority: Minor > Labels: github-import > Fix For: 2.0.0 > > Attachments: get_height.patch > > > The getHeight in the fonts gave back approximated heights and in some cases > only height the first time the function was called. Tried to clean up the > functions and return a more accurate height for each glyph. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3075) Changed to the getHeight function for fonts so it will return a more accurate height
[ https://issues.apache.org/jira/browse/PDFBOX-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983199#comment-14983199 ] Daniel Persson commented on PDFBOX-3075: John: I've read your reply at http://mail-archives.apache.org/mod_mbox/pdfbox-users/201510.mbox/%3cbb3c23ee-0c8c-4b5f-a806-eb8d9373a...@jahewson.com%3E And as you say there you need to rethink the font height so it works with PDFTextStripper. My changes made it though the test cases so I think the stripper can't be that dependent on the actual text height. It uses the fonts boundingbox height not the font.getHeight(int code) that gives you a specific glyph height. Futher more all the font types doesn't have glyphs defined. Could be wrong behavior but in those cases you could only approximate the height. My patch gave me a unified font height in the 1000 em system so I could make accurate calculations on the position and height of glyphs. I've been running a many tests on these functions but I would like to contribute back because the help I've gotten from PDFBOX is great. When it comes to the width advance it's pretty accurate as long as I make small changes when we have vertical texts and texts that writes from right to left. But we've solved those too. The API documentation only states Description copied from interface: PDFontLike Returns the height of the given character, in glyph space. This can be expensive to calculate. Results are only approximate. Which is not that descriptive. So what do you recommend that I do going forth. I would like to build my solution on PDFBOX and I have time alotted by my company to contribute code back to PDFBOX when our work requires changes in the PDFBOX engine. This could only be done if we go in the same direction. Should all font's have glyphs? > Changed to the getHeight function for fonts so it will return a more accurate > height > > > Key: PDFBOX-3075 > URL: https://issues.apache.org/jira/browse/PDFBOX-3075 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 >Reporter: Daniel Persson >Priority: Minor > Labels: github-import > Fix For: 2.0.0 > > Attachments: get_height.patch > > > The getHeight in the fonts gave back approximated heights and in some cases > only height the first time the function was called. Tried to clean up the > functions and return a more accurate height for each glyph. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3073) Change to use media box for page size instead of cropbox.
[ https://issues.apache.org/jira/browse/PDFBOX-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983152#comment-14983152 ] Daniel Persson commented on PDFBOX-3073: Yes but if you use all the information from the PDF with the local coorinates and your function in the PDFTextStreamEngine.java then all data is in the wrong place when you actually have media and crop boxes that differs in size. I've ran about 500 examples and get the wrong placement of text every time. But if I change this to media box and then recalculate the data to the crop box after the data has been extracted I get the correct positions. > Change to use media box for page size instead of cropbox. > - > > Key: PDFBOX-3073 > URL: https://issues.apache.org/jira/browse/PDFBOX-3073 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 >Reporter: Daniel Persson >Priority: Minor > Labels: github-import > Fix For: 2.0.0 > > Attachments: mediabox_for_content.patch > > > For PDF documents where media box is larger or smaller than crop box the > content get squeezed or stretched. > For PDF content the media box should be used as the page size. > More information about this at > http://www.prepressure.com/pdf/basics/page-boxes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3075) Changed to the getHeight function for fonts so it will return a more accurate height
[ https://issues.apache.org/jira/browse/PDFBOX-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983139#comment-14983139 ] Daniel Persson commented on PDFBOX-3075: I've implemented a Bounding box function in the PDFTextStreamEngine.java that could give you an accurate box not requiring you to check the direction for using. Is this function also not a valid contribution? This function uses the getHeight function. > Changed to the getHeight function for fonts so it will return a more accurate > height > > > Key: PDFBOX-3075 > URL: https://issues.apache.org/jira/browse/PDFBOX-3075 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 >Reporter: Daniel Persson >Priority: Minor > Labels: github-import > Fix For: 2.0.0 > > Attachments: get_height.patch > > > The getHeight in the fonts gave back approximated heights and in some cases > only height the first time the function was called. Tried to clean up the > functions and return a more accurate height for each glyph. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3074) Mark transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983130#comment-14983130 ] Daniel Persson commented on PDFBOX-3074: Thanks for the input. Not found any good information about this embedded content that should not be show though. Transparency groups aren't stacked and only the data inside a marked content is actually a part of a group. Or have I miss understood this concept? > Mark transparency groups > > > Key: PDFBOX-3074 > URL: https://issues.apache.org/jira/browse/PDFBOX-3074 > Project: PDFBox > Issue Type: New Feature > Components: Text extraction >Affects Versions: 2.0.0 >Reporter: Daniel Persson >Priority: Minor > Labels: github-import > Fix For: 2.0.0 > > Attachments: mark_transparency_groups.patch > > > We try to read text from PDF files but some of the files include extra data > that is never shown. These segments are usually grouped in transparency > groups. So for us this function to flag a marked content as a transparency > group is quite useful. > If there is a way to do this please tell me or if there is a better way to > remove text that isn't presented or drawn when the PDF is viewed then I'm all > ears. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3075) Changed to the getHeight function for fonts so it will return a more accurate height
[ https://issues.apache.org/jira/browse/PDFBOX-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-3075: --- Attachment: get_height.patch Patch for this issue > Changed to the getHeight function for fonts so it will return a more accurate > height > > > Key: PDFBOX-3075 > URL: https://issues.apache.org/jira/browse/PDFBOX-3075 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 >Reporter: Daniel Persson >Priority: Minor > Labels: github-import > Fix For: 2.0.0 > > Attachments: get_height.patch > > > The getHeight in the fonts gave back approximated heights and in some cases > only height the first time the function was called. Tried to clean up the > functions and return a more accurate height for each glyph. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3074) Mark transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-3074: --- Attachment: mark_transparency_groups.patch Patch for this issue > Mark transparency groups > > > Key: PDFBOX-3074 > URL: https://issues.apache.org/jira/browse/PDFBOX-3074 > Project: PDFBox > Issue Type: New Feature > Components: Text extraction >Affects Versions: 2.0.0 >Reporter: Daniel Persson >Priority: Minor > Labels: github-import > Fix For: 2.0.0 > > Attachments: mark_transparency_groups.patch > > > We try to read text from PDF files but some of the files include extra data > that is never shown. These segments are usually grouped in transparency > groups. So for us this function to flag a marked content as a transparency > group is quite useful. > If there is a way to do this please tell me or if there is a better way to > remove text that isn't presented or drawn when the PDF is viewed then I'm all > ears. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3073) Change to use media box for page size instead of cropbox.
[ https://issues.apache.org/jira/browse/PDFBOX-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Persson updated PDFBOX-3073: --- Attachment: mediabox_for_content.patch Patch for this issue. > Change to use media box for page size instead of cropbox. > - > > Key: PDFBOX-3073 > URL: https://issues.apache.org/jira/browse/PDFBOX-3073 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 >Reporter: Daniel Persson >Priority: Minor > Labels: github-import > Fix For: 2.0.0 > > Attachments: mediabox_for_content.patch > > > For PDF documents where media box is larger or smaller than crop box the > content get squeezed or stretched. > For PDF content the media box should be used as the page size. > More information about this at > http://www.prepressure.com/pdf/basics/page-boxes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3075) Changed to the getHeight function for fonts so it will return a more accurate height
Daniel Persson created PDFBOX-3075: -- Summary: Changed to the getHeight function for fonts so it will return a more accurate height Key: PDFBOX-3075 URL: https://issues.apache.org/jira/browse/PDFBOX-3075 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 2.0.0 Reporter: Daniel Persson Priority: Minor Fix For: 2.0.0 The getHeight in the fonts gave back approximated heights and in some cases only height the first time the function was called. Tried to clean up the functions and return a more accurate height for each glyph. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3073) Change to use media box for page size instead of cropbox.
Daniel Persson created PDFBOX-3073: -- Summary: Change to use media box for page size instead of cropbox. Key: PDFBOX-3073 URL: https://issues.apache.org/jira/browse/PDFBOX-3073 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 2.0.0 Reporter: Daniel Persson Priority: Minor Fix For: 2.0.0 For PDF documents where media box is larger or smaller than crop box the content get squeezed or stretched. For PDF content the media box should be used as the page size. More information about this at http://www.prepressure.com/pdf/basics/page-boxes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3074) Mark transparency groups
Daniel Persson created PDFBOX-3074: -- Summary: Mark transparency groups Key: PDFBOX-3074 URL: https://issues.apache.org/jira/browse/PDFBOX-3074 Project: PDFBox Issue Type: New Feature Components: Text extraction Affects Versions: 2.0.0 Reporter: Daniel Persson Priority: Minor Fix For: 2.0.0 We try to read text from PDF files but some of the files include extra data that is never shown. These segments are usually grouped in transparency groups. So for us this function to flag a marked content as a transparency group is quite useful. If there is a way to do this please tell me or if there is a better way to remove text that isn't presented or drawn when the PDF is viewed then I'm all ears. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org