Problem with processTextPosition
Hi all, I was tying to manually feed text position objects to processTextPosition method in PDFTextStripper class. I created a sub class of PDFTextStripper and override processStream method. In processStream method I manually created two text position objects for words "W" and "H". At the end I passed them to processTextPosition processTextPosition(textPosition1); processTextPosition(textPosition2); Then I tested it using PDFTextStripper ocrStripper = new PDFOCRTextStripper(); PDDocument document = PDDocument.load("some pdf file"); String data = ocrStripper.getText(document); System.out.println(data); Output was : H W Then I changed the sequence of passing TextPosition objects in [1] processTextPosition(textPosition2); processTextPosition(textPosition1); Output was : WH -- As far as I understood processTextPosition works with the text position metadata like x and y co-ordinates of the input text. It should not depend on the order of the input sequence. But in case It seems like processTextPosition method works according to order of input. Ex. If I input W first, it prints W first without considering it's actual position. Is this the normal behaviour? Or am I missing something here? [1] https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649 -- Regards W.Dimuthu Upeksha Undergraduate Department of Computer Science And Engineering University of Moratuwa, Sri Lanka
[jira] [Commented] (PDFBOX-1756) ClassCastException CosString cannot be cast to COSName
[ https://issues.apache.org/jira/browse/PDFBOX-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000649#comment-14000649 ] Tilman Hausherr commented on PDFBOX-1756: - It doesn't happen with the non sequential parser, but it happens when saving that file: {code} Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:519) at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:449) at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1099) at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:555) at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1364) at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1238) at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1220) at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:455) at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:189) {code} > ClassCastException CosString cannot be cast to COSName > -- > > Key: PDFBOX-1756 > URL: https://issues.apache.org/jira/browse/PDFBOX-1756 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.8.2 > Environment: Ubuntu Linux & Windows 7 (both JDK6) >Reporter: William Palmer >Priority: Minor > Attachments: testPDF_twoAuthors.pdf > > > Opening and saving a PDF causes this exception in 1.8.2: > Exception in thread "main" java.lang.ClassCastException: > org.apache.pdfbox.cos.COSString cannot be cast to > org.apache.pdfbox.cos.COSName > at > org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:507) > at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:435) > at > org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1122) > at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:552) > at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1501) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1324) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305) > The PDF is here: > http://digitalcorpora.org/corp/nps/files/govdocs1/008/008677.pdf > Code to reproduce the exception: > PDFParser parser = new PDFParser(new FileInputStream(new File("008677.pdf"))); > parser.parse(); > File temp = File.createTempFile("temp-", ".pdf"); > parser.getPDDocument().save(temp); > parser.getDocument().close(); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999265#comment-13999265 ] Tilman Hausherr commented on PDFBOX-2079: - The bug happens since rev 1585781, which fixed PDFBOX-2016, which was another bug with lengths. I suspect that because of the sequential parsing, the correct length wasn't available when reading the PDF, so we were reading "endstream" (although the length is available downwards!). That length read was wrong because of what you mentioned in the beginning. I will need to find out why the sequential parser reads CR LF, whether this is correct or not, and whether it can be changed. Anyway, it shows once again that you shouldn't use load(). There's an useNonSequentialParser config option in TIKA. > Extra new line characters extracted in 1.8.5 for embedded files leading to > ZipFile exception in Java 1.6 > > > Key: PDFBOX-2079 > URL: https://issues.apache.org/jira/browse/PDFBOX-2079 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 1.8.5, 1.8.6, 2.0.0 >Reporter: Tim Allison >Assignee: Tilman Hausherr >Priority: Minor > Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf > > > For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from > an embedded zip (well, docx) file. PDFBox 1.8.5 extracts 17662 bytes -- > "\r\n" at the end of the stream. This leads to a ZipException for ZipFile(s) > in Java 1.6, but not Java 1.7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000320#comment-14000320 ] Juraj Lonc commented on PDFBOX-2081: I have also tried to replace {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} by {code} Rectangle2D rc0=getGraphicsState().getCurrentClippingPath().getBounds2D(); Rectangle2D rc1=new Rectangle2D.Double(rc0.getMinX(), rc0.getMinY(), rc0.getWidth()+1000, rc0.getHeight()); graphics.setClip(rc1); {code} so I made clipping area wider. This "helped" too - lines were rendered. > Lines that exceeds clipping area are not drawn > -- > > Key: PDFBOX-2081 > URL: https://issues.apache.org/jira/browse/PDFBOX-2081 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Juraj Lonc > Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png, > rendered_(with_null_clipping).png > > > PDF contains shapes that are partly on the paper and partly outside (shape > overflows paper borders). > Those shapes are not rendered to image. > It is caused by clipping area. > When I replace line in PDFDrawer.strokePath() > {noformat} > graphics.setClip(getGraphicsState().getCurrentClippingPath()); > {noformat} > to > {noformat} > graphics.setClip(null); > {noformat} > then everything is rendered correctly. > Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2078) DPI always 96
[ https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998715#comment-13998715 ] proba edited comment on PDFBOX-2078 at 5/15/14 1:55 PM: Using ImageIOUtil fixed the DPI issue, thank you. Now I figured out a colour changing problem for myself in barcode pdf to image transformation, but thats a different story. If you happen to know the answer that would be lovely. The barcode colours on the picture get inverted (black goes to white and white goes to black) which i saw was reported before on these forums. Is there an easy known solution to this? was (Author: proba): Using ImageIOUtil fixed the DPI issue, thank you. Now I figured out a font changing problem for myself in barcode pdf to image transformation, but thats a different story. If you happen to know the answer that would be lovely. The barcode colours on the picture get inverted (black goes to white and white goes to black) which i saw was reported before on these forums. Is there an easy known solution to this? > DPI always 96 > - > > Key: PDFBOX-2078 > URL: https://issues.apache.org/jira/browse/PDFBOX-2078 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.5 >Reporter: proba > > I'm trying to convert a 1 page pdf report to an image using convertToImage. > My used command goes as follows: > BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); > No matter how much i change the resolution (300 in the example), the DPI > stays the same, even though the quality and the dimensions of the picture > change. > Adding a comparison between a 96 resolution picture and what should be a 300 > resolution picture (notice the DPI) > http://i58.tinypic.com/9sv339.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
Juraj Lonc created PDFBOX-2081: -- Summary: Lines that exceeds clipping area are not drawn Key: PDFBOX-2081 URL: https://issues.apache.org/jira/browse/PDFBOX-2081 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: Obyčajné zásielky.pdf, rendered.png PDF contains shapes that are partly on the paper and partly outside (shape overflows paper borders). Those shapes are not rendered to image. It is caused by clipping area. When I replace line in PDFDrawer.strokePath() {noformat} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {noformat} to {noformat} graphics.setClip(null); {noformat} then everything is rendered correctly. Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2078) DPI always 96
[ https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998715#comment-13998715 ] proba commented on PDFBOX-2078: --- writing them down with imageIOwrite. To be precise: ImageIO.write(bi, "jpg", new File("d:\\pdfimageold"+count+".jpg")); Tried other types as well naturally. > DPI always 96 > - > > Key: PDFBOX-2078 > URL: https://issues.apache.org/jira/browse/PDFBOX-2078 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.5 >Reporter: proba > > I'm trying to convert a 1 page pdf report to an image using convertToImage. > My used command goes as follows: > BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); > No matter how much i change the resolution (300 in the example), the DPI > stays the same, even though the quality and the dimensions of the picture > change. > Adding a comparison between a 96 resolution picture and what should be a 300 > resolution picture (notice the DPI) > http://i58.tinypic.com/9sv339.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998936#comment-13998936 ] Tilman Hausherr commented on PDFBOX-2079: - One good news: it does not happen with loadNonSeq(). Only with load(). > Extra new line characters extracted in 1.8.5 for embedded files leading to > ZipFile exception in Java 1.6 > > > Key: PDFBOX-2079 > URL: https://issues.apache.org/jira/browse/PDFBOX-2079 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 1.8.5, 1.8.6, 2.0.0 >Reporter: Tim Allison >Assignee: Tilman Hausherr >Priority: Minor > Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf > > > For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from > an embedded zip (well, docx) file. PDFBox 1.8.5 extracts 17662 bytes -- > "\r\n" at the end of the stream. This leads to a ZipException for ZipFile(s) > in Java 1.6, but not Java 1.7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (PDFBOX-1994) PDDocument.load(filename.pdf) hangs for pdf files having size
[ https://issues.apache.org/jira/browse/PDFBOX-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler closed PDFBOX-1994. -- Resolution: Not a Problem Assignee: Andreas Lehmkühler Set to closed, as the issue seems to be about the used environment and not about PDFBox. > PDDocument.load(filename.pdf) hangs for pdf files having size > - > > Key: PDFBOX-1994 > URL: https://issues.apache.org/jira/browse/PDFBOX-1994 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.4 >Reporter: brijesh >Assignee: Andreas Lehmkühler > > The below code i am using for loading my pdf. but my pdf file is not a zero > sized files and having full permission and it is not a corrupt file also. but > i ddint get any error after code. it just hangs. > it is working in local, but not working in server . > (created ,jar files and then exe, then the .exe will excuted in the server) > java using 1,4 > PDDocument pdf=PDDocument.load("d:\\filename.pdf"); > pdf.print(); > please provide me why the same code is not working in server. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1463) Unreadable fonts on UNIX
[ https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998605#comment-13998605 ] Andreas Lehmkühler edited comment on PDFBOX-1463 at 5/15/14 9:45 AM: - I ran into this problem recently as well. I am experiencing this issue on a Solaris machine as well as on an Ubuntu box. I am using Java 1.6 on both machines and it only happens with certain Arial Fonts e.g.: JFIGPU+Arial-BoldMT KLSYIK+ArialMT Normal Arial works just fine though and it appears to be rendered correctly. I am using PDFBox 2.0.0 and I am trying to create a PDF for testing purposes because the original PDF is again confidential. Before using PDFBox 2.0.0 this PDF caused a JVM crash just as described in PDFBOX-1426 was (Author: francesca.herpertz): I ran into this problem recently as well. I am experiencing this issue on a Solaris machine as well as on an Ubuntu box. I am using Java 1.6 on both machines and it only happens with certain Arial Fonts e.g.: JFIGPU+Arial-BoldMT KLSYIK+ArialMT Normal Arial works just fine though and it appears to be rendered correctly. I am using PDFBox 2.0.0 and I am trying to create a PDF for testing purposes because the original PDF is again confidential. Before using PDFBox 2.0.0 this PDF caused a JVM crash just as described in this jira ticket - PDFBox-1426. > Unreadable fonts on UNIX > > > Key: PDFBOX-1463 > URL: https://issues.apache.org/jira/browse/PDFBOX-1463 > Project: PDFBox > Issue Type: Bug > Components: Rendering > Environment: UNIX >Reporter: Sindhu N Kashyap > Attachments: screenshot-1.jpg > > > I'm converting PDFs to tif. The conversion is fine when run in Windows. When > i run the same code in UNIX ,its converting with a font that is unreadable. I > put some font ttf files in the classes path but that has not made any > difference. Please help. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array
Tilman Hausherr created PDFBOX-2070: --- Summary: Filter.decode() modifies PDF if there is a filter array Key: PDFBOX-2070 URL: https://issues.apache.org/jira/browse/PDFBOX-2070 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Attachments: after.pdf, before.pdf If there are several filters (filter array) in an image, PDFBox is inserting an empty DecodeParms object, instead of either inserting an empty COSAarray, or (better) do nothing. Saving such a PDF results in it not being displayable in the Acrobat Reader. Test code: {code} PDDocument d = PDDocument.load("before.pdf"); new PDFRenderer(d).renderImage(0); d.save("after.pdf"); {code} The rendering is important because without it, the filtered objects aren't decoded. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1756) ClassCastException CosString cannot be cast to COSName
[ https://issues.apache.org/jira/browse/PDFBOX-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-1756: Attachment: testPDF_twoAuthors.pdf Shareable test document from TIKA-1252. Same issue. > ClassCastException CosString cannot be cast to COSName > -- > > Key: PDFBOX-1756 > URL: https://issues.apache.org/jira/browse/PDFBOX-1756 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.8.2 > Environment: Ubuntu Linux & Windows 7 (both JDK6) >Reporter: William Palmer >Priority: Minor > Attachments: testPDF_twoAuthors.pdf > > > Opening and saving a PDF causes this exception in 1.8.2: > Exception in thread "main" java.lang.ClassCastException: > org.apache.pdfbox.cos.COSString cannot be cast to > org.apache.pdfbox.cos.COSName > at > org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:507) > at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:435) > at > org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1122) > at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:552) > at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1501) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1324) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305) > The PDF is here: > http://digitalcorpora.org/corp/nps/files/govdocs1/008/008677.pdf > Code to reproduce the exception: > PDFParser parser = new PDFParser(new FileInputStream(new File("008677.pdf"))); > parser.parse(); > File temp = File.createTempFile("temp-", ".pdf"); > parser.getPDDocument().save(temp); > parser.getDocument().close(); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2081: --- Attachment: rendered_(missing_lines).png Previously uploaded file was not the one I wanted to upload. Now I have attached image that was actually rendered > Lines that exceeds clipping area are not drawn > -- > > Key: PDFBOX-2081 > URL: https://issues.apache.org/jira/browse/PDFBOX-2081 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Juraj Lonc > Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png > > > PDF contains shapes that are partly on the paper and partly outside (shape > overflows paper borders). > Those shapes are not rendered to image. > It is caused by clipping area. > When I replace line in PDFDrawer.strokePath() > {noformat} > graphics.setClip(getGraphicsState().getCurrentClippingPath()); > {noformat} > to > {noformat} > graphics.setClip(null); > {noformat} > then everything is rendered correctly. > Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion
proba created PDFBOX-2080: - Summary: Barcode getting color inverted in pdf to image conversion Key: PDFBOX-2080 URL: https://issues.apache.org/jira/browse/PDFBOX-2080 Project: PDFBox Issue Type: Bug Reporter: proba Attachments: FPR0T9.pdf, slika2_3.jpg While converting a 1 page pdf to an image (both attached below), the image converts properly, however the barcodes colours invert. The code used to do the conversion looks like this right now: public static void convertPDFToJPG(String src){ try{ //load pdf file in the document object PDDocument doc=PDDocument.load(new FileInputStream(src)); //Get all pages from document and store them in a list List pages=doc.getDocumentCatalog().getAllPages(); //create iterator object so it is easy to access each page from the list Iterator i= pages.iterator(); int count=1; //count variable used to separate each image file //Convert every page of the pdf document to a unique image file System.out.println("Please wait..."); while(i.hasNext()){ PDPage page=i.next(); BufferedImage bi=page.convertToImage( BufferedImage.TYPE_INT_RGB, 300); FileOutputStream fos = new FileOutputStream(new File("d:\\slika2_3.jpg")); //ImageIO.write(bi, "jpg", new File("d:\\pdfimageold.jpg")); boolean foundWriter = ImageIOUtil.writeImage(bi, "jpg", fos, 300); count++; } System.out.println("Conversion complete"); }catch(IOException ie){ie.printStackTrace();} } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-2079: Attachment: PDFBOX-2079-TEST_CASE.patch embedded_zip.pdf test file (from TIKA-1124) and test case attached > Extra new line characters extracted in 1.8.5 for embedded files leading to > ZipFile exception in Java 1.6 > > > Key: PDFBOX-2079 > URL: https://issues.apache.org/jira/browse/PDFBOX-2079 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 1.8.5, 1.8.6, 2.0.0 >Reporter: Tim Allison >Priority: Minor > Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf > > > For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from > an embedded zip (well, docx) file. PDFBox 1.8.5 extracts 17662 bytes -- > "\r\n" at the end of the stream. This leads to a ZipException for ZipFile(s) > in Java 1.6, but not Java 1.7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2081: --- Attachment: rendered_(with_null_clipping).png > Lines that exceeds clipping area are not drawn > -- > > Key: PDFBOX-2081 > URL: https://issues.apache.org/jira/browse/PDFBOX-2081 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Juraj Lonc > Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png, > rendered_(with_null_clipping).png > > > PDF contains shapes that are partly on the paper and partly outside (shape > overflows paper borders). > Those shapes are not rendered to image. > It is caused by clipping area. > When I replace line in PDFDrawer.strokePath() > {noformat} > graphics.setClip(getGraphicsState().getCurrentClippingPath()); > {noformat} > to > {noformat} > graphics.setClip(null); > {noformat} > then everything is rendered correctly. > Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1463) Unreadable fonts on UNIX
[ https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998641#comment-13998641 ] Andreas Lehmkühler commented on PDFBOX-1463: [~Francesca.Herpertz]: This ticket seems to be about type1 font issues and PDFBOX-1426 is about truetype font issues. I'm going to close this one, as the origin poster couldn't provide any addtional information to help us solving this issue. Please, create a new ticket and provide as much as possible details about the issue (issue description, stack trace, version info etc.) A sample pdf would be a definite plus > Unreadable fonts on UNIX > > > Key: PDFBOX-1463 > URL: https://issues.apache.org/jira/browse/PDFBOX-1463 > Project: PDFBox > Issue Type: Bug > Components: Rendering > Environment: UNIX >Reporter: Sindhu N Kashyap > Attachments: screenshot-1.jpg > > > I'm converting PDFs to tif. The conversion is fine when run in Windows. When > i run the same code in UNIX ,its converting with a font that is unreadable. I > put some font ttf files in the classes path but that has not made any > difference. Please help. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array
[ https://issues.apache.org/jira/browse/PDFBOX-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998884#comment-13998884 ] Tilman Hausherr commented on PDFBOX-2070: - Did some DRY refactoring in rev 1594969 by moving 3 searches for an imagereader into its own method. Btw I have no idea why JPXFilter.readJPX() is static, so I removed that too. > Filter.decode() modifies PDF if there is a filter array > --- > > Key: PDFBOX-2070 > URL: https://issues.apache.org/jira/browse/PDFBOX-2070 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Tilman Hausherr > Fix For: 2.0.0 > > Attachments: after.pdf, before.pdf > > > If there are several filters (filter array) in an image, PDFBox is inserting > an empty DecodeParms object here > {code} > params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index)); > {code} > instead of either inserting an empty COSArray, or (better) do nothing. Saving > such a PDF results in it not being displayable in the Acrobat Reader. > Test code: > {code} > PDDocument d = PDDocument.load("before.pdf"); > new PDFRenderer(d).renderImage(0); > d.save("after.pdf"); > {code} > The rendering is important because without it, the filtered objects aren't > decoded. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reassigned PDFBOX-2079: --- Assignee: Tilman Hausherr > Extra new line characters extracted in 1.8.5 for embedded files leading to > ZipFile exception in Java 1.6 > > > Key: PDFBOX-2079 > URL: https://issues.apache.org/jira/browse/PDFBOX-2079 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 1.8.5, 1.8.6, 2.0.0 >Reporter: Tim Allison >Assignee: Tilman Hausherr >Priority: Minor > Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf > > > For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from > an embedded zip (well, docx) file. PDFBox 1.8.5 extracts 17662 bytes -- > "\r\n" at the end of the stream. This leads to a ZipException for ZipFile(s) > in Java 1.6, but not Java 1.7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (PDFBOX-2078) DPI always 96
[ https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-2078. --- Resolution: Fixed Assignee: Tilman Hausherr I'm closing this one as it wasn't a problem. Please open a new issue about the other problem, and don't forget to attach the PDF and the image. > DPI always 96 > - > > Key: PDFBOX-2078 > URL: https://issues.apache.org/jira/browse/PDFBOX-2078 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.5 >Reporter: proba >Assignee: Tilman Hausherr > > I'm trying to convert a 1 page pdf report to an image using convertToImage. > My used command goes as follows: > BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); > No matter how much i change the resolution (300 in the example), the DPI > stays the same, even though the quality and the dimensions of the picture > change. > Adding a comparison between a 96 resolution picture and what should be a 300 > resolution picture (notice the DPI) > http://i58.tinypic.com/9sv339.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1895) Modifying a damaged PDF damages it further
[ https://issues.apache.org/jira/browse/PDFBOX-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993984#comment-13993984 ] Pat Hickey commented on PDFBOX-1895: The input is a file that Adobe Reader will display (reference above). This code copies the input to the output *without* decrypting. Should I expect that Adobe Reader will display the file now? It does not. Everything is completely garbled. And it *still* complains about missing fonts. I'm beginning to suspect a conflation of font and decryption issues. The trick will be how to debug this w/o writing another parser. :( {code} public static void main( String[] args ) { PDDocument document = PDDocument.load( args[ 0 ] ); document.save( args[ 1 ] ); document.close(); System.exit( 0 ); } {code} > Modifying a damaged PDF damages it further > -- > > Key: PDFBOX-1895 > URL: https://issues.apache.org/jira/browse/PDFBOX-1895 > Project: PDFBox > Issue Type: Bug > Components: Writing >Affects Versions: 1.8.3, 1.8.4 >Reporter: Pat Hickey > > When re-writing a document with font descriptions, Adobe Reader is unable to > display the fonts in the document. Reader can display the fonts in the > original document. The difference is that in the original document, the font > descriptions are in lower object numbers than the font references; in the > output document, the font descriptions are in higher object numbers than the > font references. Is there a quick way to re-order them? > Update: the PDF file in question is actually corrupt, but somehow modifying > it with PDFBox causes it to no longer be readable with Adobe Reader. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-958) convertToImage mangles images which were in the PDF
[ https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-958: -- Attachment: (was: Wrycan® Lorem Ipsum Test.pdf) > convertToImage mangles images which were in the PDF > --- > > Key: PDFBOX-958 > URL: https://issues.apache.org/jira/browse/PDFBOX-958 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.2.1, 1.4.0, 1.5.0 > Environment: RHEL5 and WinXP, java version "1.6.0_23" >Reporter: Eric Schwarzenbach >Assignee: Andreas Lehmkühler >Priority: Critical > Fix For: 1.6.0 > > Attachments: Image of Page 13.jpeg, Image of Page 13.png, > PDFBOX958-WrycanLoremIpsumTest.pdf > > > Of the PDFs we've tried running through PDFBox and generating page images, a > number of them (coming from disparate sources and method of creation) seem to > produce images where an image that was embedded in the page of the PDF shows > somewhat mangled. It seems to be divided by horizontal stripes, where some > stripes look normal, others seem to have some kind of "smearing" effect going > on. See attached images and original PDF (image is of page 13). > I marked this as critical as we are trying to use PDFBox in a project where > page images are crucial, and inability to produce reasonable looking page > images is pretty much a deal breaker. > The code we use to extract the images looks more or less like the following: > BufferedImage image = > page.convertToImage(); > > SmartDeferredFileOutputStream outStream > = new SmartDeferredFileOutputStream(); > String[] writerFormatNames = > ImageIO.getWriterFormatNames(); > ImageIO.write(image, "jpeg", outStream); > outStream.close() > We've also tried specifying "png". In both "jpg" and "png" cases we get an > image file that is indeed the correct format, and both images look exactly > the same. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array
[ https://issues.apache.org/jira/browse/PDFBOX-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2070: Description: If there are several filters (filter array) in an image, PDFBox is inserting an empty DecodeParms object here {code} params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index)); {code} instead of either inserting an empty COSArray, or (better) do nothing. Saving such a PDF results in it not being displayable in the Acrobat Reader. Test code: {code} PDDocument d = PDDocument.load("before.pdf"); new PDFRenderer(d).renderImage(0); d.save("after.pdf"); {code} The rendering is important because without it, the filtered objects aren't decoded. was: If there are several filters (filter array) in an image, PDFBox is inserting an empty DecodeParms object, instead of either inserting an empty COSAarray, or (better) do nothing. Saving such a PDF results in it not being displayable in the Acrobat Reader. Test code: {code} PDDocument d = PDDocument.load("before.pdf"); new PDFRenderer(d).renderImage(0); d.save("after.pdf"); {code} The rendering is important because without it, the filtered objects aren't decoded. > Filter.decode() modifies PDF if there is a filter array > --- > > Key: PDFBOX-2070 > URL: https://issues.apache.org/jira/browse/PDFBOX-2070 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Tilman Hausherr > Attachments: after.pdf, before.pdf > > > If there are several filters (filter array) in an image, PDFBox is inserting > an empty DecodeParms object here > {code} > params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index)); > {code} > instead of either inserting an empty COSArray, or (better) do nothing. Saving > such a PDF results in it not being displayable in the Acrobat Reader. > Test code: > {code} > PDDocument d = PDDocument.load("before.pdf"); > new PDFRenderer(d).renderImage(0); > d.save("after.pdf"); > {code} > The rendering is important because without it, the filtered objects aren't > decoded. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998925#comment-13998925 ] Tilman Hausherr commented on PDFBOX-2079: - I can confirm the wrong length :-( and will investigate this. > Extra new line characters extracted in 1.8.5 for embedded files leading to > ZipFile exception in Java 1.6 > > > Key: PDFBOX-2079 > URL: https://issues.apache.org/jira/browse/PDFBOX-2079 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 1.8.5, 1.8.6, 2.0.0 >Reporter: Tim Allison >Assignee: Tilman Hausherr >Priority: Minor > Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf > > > For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from > an embedded zip (well, docx) file. PDFBox 1.8.5 extracts 17662 bytes -- > "\r\n" at the end of the stream. This leads to a ZipException for ZipFile(s) > in Java 1.6, but not Java 1.7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1463) Unreadable fonts on UNIX
[ https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998605#comment-13998605 ] Francesca Nina Herpertz edited comment on PDFBOX-1463 at 5/15/14 9:21 AM: -- I ran into this problem recently as well. I am experiencing this issue on a Solaris machine as well as on an Ubuntu box. I am using Java 1.6 on both machines and it only happens with certain Arial Fonts e.g.: JFIGPU+Arial-BoldMT KLSYIK+ArialMT Normal Arial works just fine though and it appears to be rendered correctly. I am using PDFBox 2.0.0 and I am trying to create a PDF for testing purposes because the original PDF is again confidential. Before using PDFBox 2.0.0 this PDF caused a JVM crash just as described in this jira ticket - PDFBox-1426. was (Author: francesca.herpertz): I ran into this problem recently as well. I am experiencing this issue on a Solaris machine as well as on an Ubuntu box. I am using Java 1.6 on both machines and it only happens with certain Arial Fonts e.g.: JFIGPU+Arial-BoldMT KLSYIK+ArialMT Normal Arial works just fine though and it appears to be rendered correctly. > Unreadable fonts on UNIX > > > Key: PDFBOX-1463 > URL: https://issues.apache.org/jira/browse/PDFBOX-1463 > Project: PDFBox > Issue Type: Bug > Components: Rendering > Environment: UNIX >Reporter: Sindhu N Kashyap > Attachments: screenshot-1.jpg > > > I'm converting PDFs to tif. The conversion is fine when run in Windows. When > i run the same code in UNIX ,its converting with a font that is unreadable. I > put some font ttf files in the classes path but that has not made any > difference. Please help. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion
[ https://issues.apache.org/jira/browse/PDFBOX-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1463#comment-1463 ] Andreas Lehmkühler commented on PDFBOX-2080: The barcode is inverted in 1.8.4, 1.8.5 and the 1.8-branch. It looks good in the current trunk (build 294), but the background of the page isn't white. > Barcode getting color inverted in pdf to image conversion > - > > Key: PDFBOX-2080 > URL: https://issues.apache.org/jira/browse/PDFBOX-2080 > Project: PDFBox > Issue Type: Bug >Reporter: proba > Attachments: FPR0T9.pdf, slika2_3.jpg > > > While converting a 1 page pdf to an image (both attached below), the image > converts properly, however the barcodes colours invert. > The code used to do the conversion looks like this right now: > public static void convertPDFToJPG(String src){ > try{ > //load pdf file in the document object > PDDocument doc=PDDocument.load(new FileInputStream(src)); > //Get all pages from document and store them in a list > List pages=doc.getDocumentCatalog().getAllPages(); > //create iterator object so it is easy to access each page > from the list > Iterator i= pages.iterator(); > int count=1; //count variable used to separate each image > file > //Convert every page of the pdf document to a unique image > file > System.out.println("Please wait..."); > while(i.hasNext()){ > PDPage page=i.next(); > BufferedImage bi=page.convertToImage( > BufferedImage.TYPE_INT_RGB, 300); > FileOutputStream fos = new FileOutputStream(new > File("d:\\slika2_3.jpg")); > //ImageIO.write(bi, "jpg", new > File("d:\\pdfimageold.jpg")); > boolean foundWriter = ImageIOUtil.writeImage(bi, > "jpg", fos, 300); > count++; > > } > System.out.println("Conversion complete"); > }catch(IOException ie){ie.printStackTrace();} > } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000280#comment-14000280 ] Juraj Lonc commented on PDFBOX-2081: I know that line completely disables clipping and I know it is not a solution ;) I have used it just for description of the problem. > Lines that exceeds clipping area are not drawn > -- > > Key: PDFBOX-2081 > URL: https://issues.apache.org/jira/browse/PDFBOX-2081 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Juraj Lonc > Attachments: Obyčajné zásielky.pdf, rendered.png > > > PDF contains shapes that are partly on the paper and partly outside (shape > overflows paper borders). > Those shapes are not rendered to image. > It is caused by clipping area. > When I replace line in PDFDrawer.strokePath() > {noformat} > graphics.setClip(getGraphicsState().getCurrentClippingPath()); > {noformat} > to > {noformat} > graphics.setClip(null); > {noformat} > then everything is rendered correctly. > Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2081: --- Attachment: (was: rendered.png) > Lines that exceeds clipping area are not drawn > -- > > Key: PDFBOX-2081 > URL: https://issues.apache.org/jira/browse/PDFBOX-2081 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Juraj Lonc > Attachments: Obyčajné zásielky.pdf > > > PDF contains shapes that are partly on the paper and partly outside (shape > overflows paper borders). > Those shapes are not rendered to image. > It is caused by clipping area. > When I replace line in PDFDrawer.strokePath() > {noformat} > graphics.setClip(getGraphicsState().getCurrentClippingPath()); > {noformat} > to > {noformat} > graphics.setClip(null); > {noformat} > then everything is rendered correctly. > Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-958) convertToImage mangles images which were in the PDF
[ https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-958: -- Attachment: PDFBOX958-WrycanLoremIpsumTest.pdf > convertToImage mangles images which were in the PDF > --- > > Key: PDFBOX-958 > URL: https://issues.apache.org/jira/browse/PDFBOX-958 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.2.1, 1.4.0, 1.5.0 > Environment: RHEL5 and WinXP, java version "1.6.0_23" >Reporter: Eric Schwarzenbach >Assignee: Andreas Lehmkühler >Priority: Critical > Fix For: 1.6.0 > > Attachments: Image of Page 13.jpeg, Image of Page 13.png, > PDFBOX958-WrycanLoremIpsumTest.pdf > > > Of the PDFs we've tried running through PDFBox and generating page images, a > number of them (coming from disparate sources and method of creation) seem to > produce images where an image that was embedded in the page of the PDF shows > somewhat mangled. It seems to be divided by horizontal stripes, where some > stripes look normal, others seem to have some kind of "smearing" effect going > on. See attached images and original PDF (image is of page 13). > I marked this as critical as we are trying to use PDFBox in a project where > page images are crucial, and inability to produce reasonable looking page > images is pretty much a deal breaker. > The code we use to extract the images looks more or less like the following: > BufferedImage image = > page.convertToImage(); > > SmartDeferredFileOutputStream outStream > = new SmartDeferredFileOutputStream(); > String[] writerFormatNames = > ImageIO.getWriterFormatNames(); > ImageIO.write(image, "jpeg", outStream); > outStream.close() > We've also tried specifying "png". In both "jpg" and "png" cases we get an > image file that is indeed the correct format, and both images look exactly > the same. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2074) 4-bytes CMap entry causes exception
[ https://issues.apache.org/jira/browse/PDFBOX-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999694#comment-13999694 ] Juraj Lonc commented on PDFBOX-2074: I am curious whether Adobe Reader ignores such entries (entries are invalid) or processes them (entries are valid). > 4-bytes CMap entry causes exception > --- > > Key: PDFBOX-2074 > URL: https://issues.apache.org/jira/browse/PDFBOX-2074 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Juraj Lonc > Attachments: PDFBOX-2074_CMap.diff, pdf_with_4B_cmap_entry.pdf > > > I have PDF that has CMap entry consisting of 4 bytes. It is just one entry > with that size, other entries have 2-bytes. > Adobe reader has no problems with that, PDFBox throws Exception. > I think this Exception should not be thrown. It should be skipped or > truncated tu 2 bytes and write warning to log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2069) PDF's with Tc before Tm are getting incorrect spacing in PDFTextArea
[ https://issues.apache.org/jira/browse/PDFBOX-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Hirsh updated PDFBOX-2069: --- Attachment: PDFBox-2609-patch.zip Patch that addresses this problem > PDF's with Tc before Tm are getting incorrect spacing in PDFTextArea > > > Key: PDFBOX-2069 > URL: https://issues.apache.org/jira/browse/PDFBOX-2069 > Project: PDFBox > Issue Type: Bug > Components: Utilities >Affects Versions: 1.8.5 > Environment: Windows >Reporter: Joel Hirsh > Labels: pdfbox > Fix For: 2.0.0 > > Attachments: PDFBOX-2609.pdf, PDFBox-2609-patch.zip > > Original Estimate: 1h > Remaining Estimate: 1h > > Attached PDF is getting incorrect spacing using example program > ExtractTrextByArea.java as follows: > Text in the area:java.awt.Rectangle[x=10,y=500,width=600,height=200] > Transaction Activity > Date D e s c r i p t i o n Deposits W i t h d r a w a l s > 0 4 / 0 8 B E G I N N I N G BALANCE > 04 / 0 8 W I THDRAWAL - ATM 3 1 1 7 3 0 0 . 0 0 - > 62 M I L L H I L L ROAD WOODSTOCK N Y > 04 / 1 0 W I THDRAWAL - ACH 2 0 0 . 0 0 - > HUMAN RIGHTS WAT-B I L L PAYMT > 04 / 12 C K # 1 2 7 3 11 0 . 0 0 - > 0 4 / 1 5 W I THDRAWAL - ACH 2 0 2 . 5 7 - > NEW SOUTH INSURA -B I LL PAYMT > 04 / 1 5 W I THDRAWAL - ACH 3 6 . 2 6 - > WASTE CONNECTION-BILL PAYMT > 04 / 1 7 W I THDRAWAL - ACH 71 2 . 0 0 - > N PYMT T > 04 / 1 8 W I THDRAWAL - ACH 2958 9 . 0 0 3 > N PYMT T > 04 / 1 9 W I THDRAWAL - ACH 76 8 . 1 2 - > I believe this because PDF streams with Tc before Tm are having the matrix > applied to the Tc, which is contrary to my experience with graphic pipelines. > Most PDF streams seem to to have Tc after Tm, and thus do not hit this > situation. > I have attached a patch to two files that corrects the problem for this file, > and also works correctly on my test suite of about 40 files from other > sources. > The result for the attached file now becomes: > Text in the area:java.awt.Rectangle[x=10,y=500,width=600,height=200] > Transaction Activity > Date Description Deposits Withdrawals > 04/08 BEGINNING BALANCE > 04/08 WITHDRAWAL-ATM 3 117 300.00- > 62 MILL HILL ROAD WOODSTOCK NY > 04/10 WITHDRAWAL-ACH 200.00- > HUMAN RIGHTS WAT-BILL PAYMT > 04/12 CK# 1273 110.00- > 04/15 WITHDRAWAL-ACH 202.57- > NEW SOUTH INSURA-BILL PAYMT > 04/15 WITHDRAWAL-ACH 36.26- > WASTE CONNECTION-BILL PAYMT > 04/17 WITHDRAWAL-ACH 712.00- > N PYMT T > 04/18 WITHDRAWAL-ACH 29589.00 3 > N PYMT T > 04/19 WITHDRAWAL-ACH 768.12- -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2057) Importing BufferedImage into PDPixelMap is broken in 1.8.5
[ https://issues.apache.org/jira/browse/PDFBOX-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993667#comment-13993667 ] Tilman Hausherr commented on PDFBOX-2057: - I added code to handle bitmask transparency properly. Please give feedback whether you can get rid of your workaround. I also added a test to prevent this from breaking in the future. I also removed double assignments from CCITTFactory and added am test to check that they are really there. This was done in rev 1593569, which also added the modifications of PDFBOX-2068. Next: will look whether the problem occurs for jpeg and in 1.8. > Importing BufferedImage into PDPixelMap is broken in 1.8.5 > -- > > Key: PDFBOX-2057 > URL: https://issues.apache.org/jira/browse/PDFBOX-2057 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 1.8.5, 1.8.6 > Environment: windows vista / jdk 1.7.0_45 >Reporter: Michaël Michaud >Assignee: Tilman Hausherr > Labels: regression > Fix For: 1.8.6, 2.0.0 > > Attachments: CS-Convocation entretien signed.pdf, CS-Convocation > entretien-IText.pdf, CS-Convocation entretien-PDFBox-with-workarround.pdf, > CS-Convocation entretien-PDFBox.pdf, ImageFilterOp.java, > differentBufferedImages.pdf, renderTransparentImage.zip > > > Try to import a BufferedImage in a PDDocument with PDPixelMap > BufferedImage with TYPE_4BYTE_ABGR works fine with PDFBox 1.8.4 (though, the > pdf file contains instruction /ColorSpace /DeviceGray) > BufferedImage with TYPE_4BYTE_ABGR produces an unreadable PDF with PDFBox > 1.8.5 (though, the pdf file contains instruction /ColorSpace /DeviceRGB). > Code used to demonstrate the problem is as follows (image has also been > colored with some Graphics instructions to demonstrate that 1.8.4 is working) > : > {code} > try { > PDDocument doc = new PDDocument(); > PDPage page = new PDPage(); > doc.addPage(page); > BufferedImage awtImage = new BufferedImage(100,100, > BufferedImage.TYPE_4BYTE_ABGR); > PDPixelMap ximage = new PDPixelMap(doc, awtImage); > PDPageContentStream contentStream = new PDPageContentStream(doc, > page); > contentStream.drawXObject(ximage, 200, 200, 100, 100); > contentStream.close(); > doc.save("C:\\Temp\\PDF\\test185_4babgr.pdf"); > } catch(COSVisitorException|IOException e) { > e.printStackTrace(); > } > {code} > I also tried with a BufferedImage with TYPE_INT_ARGB but it throws an > exception with PDFBox 1.8.4 and 1.8.5 : > {code} > Exception in thread "main" java.lang.IllegalArgumentException: Raster > IntegerInterleavedRaster: width = 100 height = 100 #Bands = 1 xOff = 0 yOff = > 0 dataOffset[0] 0 is incompatible with ColorModel ColorModel: #pixelBits = 8 > numComponents = 1 color space = java.awt.color.ICC_ColorSpace@1dc80063 > transparency = 1 has alpha = false isAlphaPre = false > at java.awt.image.BufferedImage.(BufferedImage.java:630) > at > org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.createImageStream(PDPixelMap.java:107) > {code} > My main purpose was to use a BufferedImage with a CMYK ColorSpace, but > PDPixelMap seems to accept 1 component and 3 component ColorSpace only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1756) ClassCastException CosString cannot be cast to COSName
[ https://issues.apache.org/jira/browse/PDFBOX-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998856#comment-13998856 ] Tim Allison edited comment on PDFBOX-1756 at 5/15/14 4:00 PM: -- Shareable test document from TIKA-1252. Same issue. ClassCastException also now happens on initial loading/parsing. This is caught and logged, and upon a quick review, it looks like text is being succesffuly extracted. {noformat} WARN [main] (COSDocument.java:302) - java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName at org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:294) at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:627) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1224) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1189) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:118) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) {noformat} was (Author: talli...@mitre.org): Shareable test document from TIKA-1252. Same issue. > ClassCastException CosString cannot be cast to COSName > -- > > Key: PDFBOX-1756 > URL: https://issues.apache.org/jira/browse/PDFBOX-1756 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.8.2 > Environment: Ubuntu Linux & Windows 7 (both JDK6) >Reporter: William Palmer >Priority: Minor > Attachments: testPDF_twoAuthors.pdf > > > Opening and saving a PDF causes this exception in 1.8.2: > Exception in thread "main" java.lang.ClassCastException: > org.apache.pdfbox.cos.COSString cannot be cast to > org.apache.pdfbox.cos.COSName > at > org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:507) > at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:435) > at > org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1122) > at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:552) > at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1501) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1324) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305) > The PDF is here: > http://digitalcorpora.org/corp/nps/files/govdocs1/008/008677.pdf > Code to reproduce the exception: > PDFParser parser = new PDFParser(new FileInputStream(new File("008677.pdf"))); > parser.parse(); > File temp = File.createTempFile("temp-", ".pdf"); > parser.getPDDocument().save(temp); > parser.getDocument().close(); -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: PDF file characters x and y coordinates
I process about 2000 PDF files daily and I never had had an issue with the coordinates. One piece of advise though: write your own TextPositionComparator. ~Alin On Fri, May 16, 2014 at 8:39 AM, Simer P wrote: > I just needed to confirm this with you guys. > > Can the X and Y coordinates returned in the > processTextPosition(TextPosition text) ever be incorrect ? > > Because it doesn't really matter in what order the text is extracted ... if > the x and y coordinates are accurate then I can rearrange the characters > based on the applications requirements. > > So can the X and Y coordinates every be wrong ? > > Cheers >
PDF file characters x and y coordinates
I just needed to confirm this with you guys. Can the X and Y coordinates returned in the processTextPosition(TextPosition text) ever be incorrect ? Because it doesn't really matter in what order the text is extracted ... if the x and y coordinates are accurate then I can rearrange the characters based on the applications requirements. So can the X and Y coordinates every be wrong ? Cheers
[jira] [Commented] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion
[ https://issues.apache.org/jira/browse/PDFBOX-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000183#comment-14000183 ] Tilman Hausherr commented on PDFBOX-2080: - My bet is on PDFBOX-1950. That was fixed in 2.0 only. > Barcode getting color inverted in pdf to image conversion > - > > Key: PDFBOX-2080 > URL: https://issues.apache.org/jira/browse/PDFBOX-2080 > Project: PDFBox > Issue Type: Bug >Reporter: proba > Attachments: FPR0T9.pdf, slika2_3.jpg > > > While converting a 1 page pdf to an image (both attached below), the image > converts properly, however the barcodes colours invert. > The code used to do the conversion looks like this right now: > public static void convertPDFToJPG(String src){ > try{ > //load pdf file in the document object > PDDocument doc=PDDocument.load(new FileInputStream(src)); > //Get all pages from document and store them in a list > List pages=doc.getDocumentCatalog().getAllPages(); > //create iterator object so it is easy to access each page > from the list > Iterator i= pages.iterator(); > int count=1; //count variable used to separate each image > file > //Convert every page of the pdf document to a unique image > file > System.out.println("Please wait..."); > while(i.hasNext()){ > PDPage page=i.next(); > BufferedImage bi=page.convertToImage( > BufferedImage.TYPE_INT_RGB, 300); > FileOutputStream fos = new FileOutputStream(new > File("d:\\slika2_3.jpg")); > //ImageIO.write(bi, "jpg", new > File("d:\\pdfimageold.jpg")); > boolean foundWriter = ImageIOUtil.writeImage(bi, > "jpg", fos, 300); > count++; > > } > System.out.println("Conversion complete"); > }catch(IOException ie){ie.printStackTrace();} > } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2082) signing corrupts PDF when signature exactly fits allocated space
Koloom created PDFBOX-2082: -- Summary: signing corrupts PDF when signature exactly fits allocated space Key: PDFBOX-2082 URL: https://issues.apache.org/jira/browse/PDFBOX-2082 Project: PDFBox Issue Type: Bug Components: Writing Reporter: Koloom Priority: Critical The current check does not take "<>" into account, so if you are (un)lucky, the signature overwrites ">" and corrupts the PDF. Fix for 1.8: diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java index 3165589..755e849 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java @@ -779,12 +779,14 @@ public class COSWriter implements ICOSVisitor, Closeable SignatureInterface signatureInterface = doc.getSignatureInterface(); byte[] sign = signatureInterface.sign(new ByteArrayInputStream(pdfContent)); String signature = new COSString(sign).getHexString(); +++signaturePosition[0]; // move past "<" +--signaturePosition[1]; // move in front of ">" int leftSignaturerange = signaturePosition[1]-signaturePosition[0]-signature.length(); if(leftSignaturerange<0) { throw new IOException("Can't write signature, not enough space"); } -getStandardOutput().setPos(signaturePosition[0]+1); +getStandardOutput().setPos(signaturePosition[0]); getStandardOutput().write(signature.getBytes()); } } Another thing is that pdfbox now allocates (2 * preferedSize + 2) for a signature. It quite confused me to see 16k+4 bytes allocated when I called setPreferedSignatureSize(4k) - it should have allocated 8k (each signature byte takes 2 bytes in the pdf). Fix for 1.8: diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java index 358364a..23dd3ab 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java @@ -309,7 +309,7 @@ public class PDDocument implements Pageable, Closeable int preferedSignatureSize = options.getPreferedSignatureSize(); if (preferedSignatureSize > 0) { -sigObject.setContents(new byte[preferedSignatureSize * 2 + 2]); +sigObject.setContents(new byte[preferedSignatureSize]); } else { -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion
[ https://issues.apache.org/jira/browse/PDFBOX-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2080: Comment: was deleted (was: My bet is on PDFBOX-1950. That was fixed in 2.0 only.) > Barcode getting color inverted in pdf to image conversion > - > > Key: PDFBOX-2080 > URL: https://issues.apache.org/jira/browse/PDFBOX-2080 > Project: PDFBox > Issue Type: Bug >Reporter: proba > Attachments: FPR0T9.pdf, slika2_3.jpg > > > While converting a 1 page pdf to an image (both attached below), the image > converts properly, however the barcodes colours invert. > The code used to do the conversion looks like this right now: > public static void convertPDFToJPG(String src){ > try{ > //load pdf file in the document object > PDDocument doc=PDDocument.load(new FileInputStream(src)); > //Get all pages from document and store them in a list > List pages=doc.getDocumentCatalog().getAllPages(); > //create iterator object so it is easy to access each page > from the list > Iterator i= pages.iterator(); > int count=1; //count variable used to separate each image > file > //Convert every page of the pdf document to a unique image > file > System.out.println("Please wait..."); > while(i.hasNext()){ > PDPage page=i.next(); > BufferedImage bi=page.convertToImage( > BufferedImage.TYPE_INT_RGB, 300); > FileOutputStream fos = new FileOutputStream(new > File("d:\\slika2_3.jpg")); > //ImageIO.write(bi, "jpg", new > File("d:\\pdfimageold.jpg")); > boolean foundWriter = ImageIOUtil.writeImage(bi, > "jpg", fos, 300); > count++; > > } > System.out.println("Conversion complete"); > }catch(IOException ie){ie.printStackTrace();} > } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2078) DPI always 96
[ https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1495#comment-1495 ] Tilman Hausherr commented on PDFBOX-2078: - The dpi isn't part of the BufferedImage, it is calculated into a zoom factor for rendering. So you have to pass it as a parameter again when saving, it is meta data, and its use is not properly supported by ImageIO (look at the source code of ImageIOUtils :-) ) > DPI always 96 > - > > Key: PDFBOX-2078 > URL: https://issues.apache.org/jira/browse/PDFBOX-2078 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.5 >Reporter: proba >Assignee: Tilman Hausherr > > I'm trying to convert a 1 page pdf report to an image using convertToImage. > My used command goes as follows: > BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); > No matter how much i change the resolution (300 in the example), the DPI > stays the same, even though the quality and the dimensions of the picture > change. > Adding a comparison between a 96 resolution picture and what should be a 300 > resolution picture (notice the DPI) > http://i58.tinypic.com/9sv339.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999128#comment-13999128 ] Tim Allison commented on PDFBOX-2079: - Good to know. Thank you for confirming and taking a look so quickly! > Extra new line characters extracted in 1.8.5 for embedded files leading to > ZipFile exception in Java 1.6 > > > Key: PDFBOX-2079 > URL: https://issues.apache.org/jira/browse/PDFBOX-2079 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 1.8.5, 1.8.6, 2.0.0 >Reporter: Tim Allison >Assignee: Tilman Hausherr >Priority: Minor > Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf > > > For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from > an embedded zip (well, docx) file. PDFBox 1.8.5 extracts 17662 bytes -- > "\r\n" at the end of the stream. This leads to a ZipException for ZipFile(s) > in Java 1.6, but not Java 1.7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1463) Unreadable fonts on UNIX
[ https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998605#comment-13998605 ] Francesca Nina Herpertz commented on PDFBOX-1463: - I ran into this problem recently as well. I am experiencing this issue on a Solaris machine as well as on an Ubuntu box. I am using Java 1.6 on both machines and it only happens with certain Arial Fonts e.g.: JFIGPU+Arial-BoldMT KLSYIK+ArialMT Normal Arial works just fine though and it appears to be rendered correctly. > Unreadable fonts on UNIX > > > Key: PDFBOX-1463 > URL: https://issues.apache.org/jira/browse/PDFBOX-1463 > Project: PDFBox > Issue Type: Bug > Components: Rendering > Environment: UNIX >Reporter: Sindhu N Kashyap > Attachments: screenshot-1.jpg > > > I'm converting PDFs to tif. The conversion is fine when run in Windows. When > i run the same code in UNIX ,its converting with a font that is unreadable. I > put some font ttf files in the classes path but that has not made any > difference. Please help. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2078) DPI always 96
[ https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998715#comment-13998715 ] proba edited comment on PDFBOX-2078 at 5/15/14 1:55 PM: Using ImageIOUtil fixed the DPI issue, thank you. Now I figured out a colour changing problem for myself in barcode pdf to image transformation, but thats a different story. If you happen to know the answer though that would be lovely. The barcode colours on the picture get inverted (black goes to white and white goes to black) which i saw was reported before on these forums. Is there an easy known solution to this? was (Author: proba): Using ImageIOUtil fixed the DPI issue, thank you. Now I figured out a colour changing problem for myself in barcode pdf to image transformation, but thats a different story. If you happen to know the answer that would be lovely. The barcode colours on the picture get inverted (black goes to white and white goes to black) which i saw was reported before on these forums. Is there an easy known solution to this? > DPI always 96 > - > > Key: PDFBOX-2078 > URL: https://issues.apache.org/jira/browse/PDFBOX-2078 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.5 >Reporter: proba > > I'm trying to convert a 1 page pdf report to an image using convertToImage. > My used command goes as follows: > BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); > No matter how much i change the resolution (300 in the example), the DPI > stays the same, even though the quality and the dimensions of the picture > change. > Adding a comparison between a 96 resolution picture and what should be a 300 > resolution picture (notice the DPI) > http://i58.tinypic.com/9sv339.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array
[ https://issues.apache.org/jira/browse/PDFBOX-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-2070. - Resolution: Fixed Assignee: Tilman Hausherr I'm not happy that three classes (ccitt filter: sets DeviceGray if not set; jbig2 filter: sets DeviceGray if not set; jpx filter: sets BPC, Decode, width, height, colorspace) alter the pdf (oops, that was my idea a few months ago), but I don't have a better idea. Correcting this will possibly require major changes. Thus setting to resolved for now, as the original bug is fixed. > Filter.decode() modifies PDF if there is a filter array > --- > > Key: PDFBOX-2070 > URL: https://issues.apache.org/jira/browse/PDFBOX-2070 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Tilman Hausherr >Assignee: Tilman Hausherr > Fix For: 2.0.0 > > Attachments: after.pdf, before.pdf > > > If there are several filters (filter array) in an image, PDFBox is inserting > an empty DecodeParms object here > {code} > params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index)); > {code} > instead of either inserting an empty COSArray, or (better) do nothing. Saving > such a PDF results in it not being displayable in the Acrobat Reader. > Test code: > {code} > PDDocument d = PDDocument.load("before.pdf"); > new PDFRenderer(d).renderImage(0); > d.save("after.pdf"); > {code} > The rendering is important because without it, the filtered objects aren't > decoded. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1392#comment-1392 ] Tilman Hausherr commented on PDFBOX-2081: - Getting better output by setting no clipping region (which is what you do) is too good to be true, although I was able to get improved rendering for two test files, the ones from PDFBOX-677 and PDFBOX-1288. On the other hand, the tiger test file now doesn't clip something (the chin) where it should have been clipped. A look at the spec found this weird part: {quote} The initial clipping path includes the entire page. A clipping path operator (W or W*, shown in Table 4.11) may appear after the last path construction operator and before the path-painting operator that terminates a path object. Although the clipping path operator appears before the painting operator, it does not alter the clipping path at the point where it appears. Rather, it modifies the effect of the succeeding painting operator. After the path has been painted, the clipping path in the graphics state is set to the intersection of the current clipping path and the newly constructed path. {quote} A look at the code shows that the clipping path is set in EndPath(), and this is called by the "n" operator. My understanding of the weird spec text is that the clipping path must be set after a paint operator, so it should also be set after any of the fill and stroke operators. I don't know if that is the cause of the problem, more analysis of PDFs needs to be done. > Lines that exceeds clipping area are not drawn > -- > > Key: PDFBOX-2081 > URL: https://issues.apache.org/jira/browse/PDFBOX-2081 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Juraj Lonc > Attachments: Obyčajné zásielky.pdf, rendered.png > > > PDF contains shapes that are partly on the paper and partly outside (shape > overflows paper borders). > Those shapes are not rendered to image. > It is caused by clipping area. > When I replace line in PDFDrawer.strokePath() > {noformat} > graphics.setClip(getGraphicsState().getCurrentClippingPath()); > {noformat} > to > {noformat} > graphics.setClip(null); > {noformat} > then everything is rendered correctly. > Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2078) DPI always 96
[ https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998715#comment-13998715 ] proba edited comment on PDFBOX-2078 at 5/15/14 1:09 PM: Using ImageIOUtil fixed the DPI issue, thank you. Now I figured out a font changing problem for myself in barcode pdf to image transformation, but thats a different story. If you happen to know the answer that would be lovely. The barcode colours on the picture get inverted (black goes to white and white goes to black) which i saw was reported before on these forums. Is there an easy known solution to this? was (Author: proba): Using ImageIOUtil fixed the DPI issue, thank you. Now I figured out a font changing problem for myself in barcode pdf to image transformation, but thats a different story > DPI always 96 > - > > Key: PDFBOX-2078 > URL: https://issues.apache.org/jira/browse/PDFBOX-2078 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.5 >Reporter: proba > > I'm trying to convert a 1 page pdf report to an image using convertToImage. > My used command goes as follows: > BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); > No matter how much i change the resolution (300 in the example), the DPI > stays the same, even though the quality and the dimensions of the picture > change. > Adding a comparison between a 96 resolution picture and what should be a 300 > resolution picture (notice the DPI) > http://i58.tinypic.com/9sv339.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2078) DPI always 96
[ https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998715#comment-13998715 ] proba edited comment on PDFBOX-2078 at 5/15/14 12:53 PM: - Using ImageIOUtil fixed the DPI issue, thank you. Now I figured out a font changing problem for myself in barcode pdf to image transformation, but thats a different story was (Author: proba): writing them down with imageIOwrite. To be precise: ImageIO.write(bi, "jpg", new File("d:\\pdfimageold"+count+".jpg")); Tried other types as well naturally. > DPI always 96 > - > > Key: PDFBOX-2078 > URL: https://issues.apache.org/jira/browse/PDFBOX-2078 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.5 >Reporter: proba > > I'm trying to convert a 1 page pdf report to an image using convertToImage. > My used command goes as follows: > BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); > No matter how much i change the resolution (300 in the example), the DPI > stays the same, even though the quality and the dimensions of the picture > change. > Adding a comparison between a 96 resolution picture and what should be a 300 > resolution picture (notice the DPI) > http://i58.tinypic.com/9sv339.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (PDFBOX-1463) Unreadable fonts on UNIX
[ https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler closed PDFBOX-1463. -- Resolution: Cannot Reproduce Assignee: Andreas Lehmkühler Set to closed as we didn't get any addtional input to solve the issue > Unreadable fonts on UNIX > > > Key: PDFBOX-1463 > URL: https://issues.apache.org/jira/browse/PDFBOX-1463 > Project: PDFBox > Issue Type: Bug > Components: Rendering > Environment: UNIX >Reporter: Sindhu N Kashyap >Assignee: Andreas Lehmkühler > Attachments: screenshot-1.jpg > > > I'm converting PDFs to tif. The conversion is fine when run in Windows. When > i run the same code in UNIX ,its converting with a font that is unreadable. I > put some font ttf files in the classes path but that has not made any > difference. Please help. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion
[ https://issues.apache.org/jira/browse/PDFBOX-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] proba updated PDFBOX-2080: -- Attachment: FPR0T9.pdf > Barcode getting color inverted in pdf to image conversion > - > > Key: PDFBOX-2080 > URL: https://issues.apache.org/jira/browse/PDFBOX-2080 > Project: PDFBox > Issue Type: Bug >Reporter: proba > Attachments: FPR0T9.pdf, slika2_3.jpg > > > While converting a 1 page pdf to an image (both attached below), the image > converts properly, however the barcodes colours invert. > The code used to do the conversion looks like this right now: > public static void convertPDFToJPG(String src){ > try{ > //load pdf file in the document object > PDDocument doc=PDDocument.load(new FileInputStream(src)); > //Get all pages from document and store them in a list > List pages=doc.getDocumentCatalog().getAllPages(); > //create iterator object so it is easy to access each page > from the list > Iterator i= pages.iterator(); > int count=1; //count variable used to separate each image > file > //Convert every page of the pdf document to a unique image > file > System.out.println("Please wait..."); > while(i.hasNext()){ > PDPage page=i.next(); > BufferedImage bi=page.convertToImage( > BufferedImage.TYPE_INT_RGB, 300); > FileOutputStream fos = new FileOutputStream(new > File("d:\\slika2_3.jpg")); > //ImageIO.write(bi, "jpg", new > File("d:\\pdfimageold.jpg")); > boolean foundWriter = ImageIOUtil.writeImage(bi, > "jpg", fos, 300); > count++; > > } > System.out.println("Conversion complete"); > }catch(IOException ie){ie.printStackTrace();} > } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2082) signing corrupts PDF when signature exactly fits allocated space
[ https://issues.apache.org/jira/browse/PDFBOX-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Štěpán Schejbal updated PDFBOX-2082: Description: The current check does not take "<>" into account, so if you are (un)lucky, the signature overwrites ">" and corrupts the PDF. Fix for 1.8: {code} diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java index 3165589..755e849 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java @@ -779,12 +779,14 @@ public class COSWriter implements ICOSVisitor, Closeable SignatureInterface signatureInterface = doc.getSignatureInterface(); byte[] sign = signatureInterface.sign(new ByteArrayInputStream(pdfContent)); String signature = new COSString(sign).getHexString(); +++signaturePosition[0]; // move past "<" +--signaturePosition[1]; // move in front of ">" int leftSignaturerange = signaturePosition[1]-signaturePosition[0]-signature.length(); if(leftSignaturerange<0) { throw new IOException("Can't write signature, not enough space"); } -getStandardOutput().setPos(signaturePosition[0]+1); +getStandardOutput().setPos(signaturePosition[0]); getStandardOutput().write(signature.getBytes()); } } {code} Another thing is that pdfbox now allocates (2 * preferedSize + 2) for a signature. It quite confused me to see 16k+4 bytes allocated when I called setPreferedSignatureSize(4k) - it should have allocated 8k (each signature byte takes 2 bytes in the pdf). Fix for 1.8: {code} diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java index 358364a..23dd3ab 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java @@ -309,7 +309,7 @@ public class PDDocument implements Pageable, Closeable int preferedSignatureSize = options.getPreferedSignatureSize(); if (preferedSignatureSize > 0) { -sigObject.setContents(new byte[preferedSignatureSize * 2 + 2]); +sigObject.setContents(new byte[preferedSignatureSize]); } else { {code} was: The current check does not take "<>" into account, so if you are (un)lucky, the signature overwrites ">" and corrupts the PDF. Fix for 1.8: diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java index 3165589..755e849 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdfwriter/COSWriter.java @@ -779,12 +779,14 @@ public class COSWriter implements ICOSVisitor, Closeable SignatureInterface signatureInterface = doc.getSignatureInterface(); byte[] sign = signatureInterface.sign(new ByteArrayInputStream(pdfContent)); String signature = new COSString(sign).getHexString(); +++signaturePosition[0]; // move past "<" +--signaturePosition[1]; // move in front of ">" int leftSignaturerange = signaturePosition[1]-signaturePosition[0]-signature.length(); if(leftSignaturerange<0) { throw new IOException("Can't write signature, not enough space"); } -getStandardOutput().setPos(signaturePosition[0]+1); +getStandardOutput().setPos(signaturePosition[0]); getStandardOutput().write(signature.getBytes()); } } Another thing is that pdfbox now allocates (2 * preferedSize + 2) for a signature. It quite confused me to see 16k+4 bytes allocated when I called setPreferedSignatureSize(4k) - it should have allocated 8k (each signature byte takes 2 bytes in the pdf). Fix for 1.8: diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java index 358364a..23dd3ab 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java @@ -309,7 +309,7 @@ public class PDDocument implements Pageable, Closeable int preferedSignatureSize = options.getPreferedSignatureSize(); if (preferedSignatureSize > 0) { -sigObject.setContents(new byte[preferedSignatureSize * 2 + 2]); +sigObject.setContents(new byte[preferedSignatureSize]); } else { > signing corrupts PDF when signature exactly fits allocated space > ---
[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2081: --- Attachment: rendered.png Obyčajné zásielky.pdf > Lines that exceeds clipping area are not drawn > -- > > Key: PDFBOX-2081 > URL: https://issues.apache.org/jira/browse/PDFBOX-2081 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.0 >Reporter: Juraj Lonc > Attachments: Obyčajné zásielky.pdf, rendered.png > > > PDF contains shapes that are partly on the paper and partly outside (shape > overflows paper borders). > Those shapes are not rendered to image. > It is caused by clipping area. > When I replace line in PDFDrawer.strokePath() > {noformat} > graphics.setClip(getGraphicsState().getCurrentClippingPath()); > {noformat} > to > {noformat} > graphics.setClip(null); > {noformat} > then everything is rendered correctly. > Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6
Tim Allison created PDFBOX-2079: --- Summary: Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6 Key: PDFBOX-2079 URL: https://issues.apache.org/jira/browse/PDFBOX-2079 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.5 Reporter: Tim Allison Priority: Minor Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from an embedded zip (well, docx) file. PDFBox 1.8.5 extracts 17662 bytes -- "\r\n" at the end of the stream. This leads to a ZipException for ZipFile(s) in Java 1.6, but not Java 1.7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2078) DPI always 96
[ https://issues.apache.org/jira/browse/PDFBOX-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999684#comment-13999684 ] proba commented on PDFBOX-2078: --- I did, and thank you for the fast answers. Might i just suggest changing (or slightly altering?) the description of the resolution parameter in the convertToImage description? Parameters: resolution - the resolution in dpi (dots per inch) Its possible i'm in the wrong and reading the description wrong here, but as pointed out in the original post the DPI doesnt actually change when changing the resolution. > DPI always 96 > - > > Key: PDFBOX-2078 > URL: https://issues.apache.org/jira/browse/PDFBOX-2078 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.5 >Reporter: proba >Assignee: Tilman Hausherr > > I'm trying to convert a 1 page pdf report to an image using convertToImage. > My used command goes as follows: > BufferedImage bi=page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); > No matter how much i change the resolution (300 in the example), the DPI > stays the same, even though the quality and the dimensions of the picture > change. > Adding a comparison between a 96 resolution picture and what should be a 300 > resolution picture (notice the DPI) > http://i58.tinypic.com/9sv339.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1463) Unreadable fonts on UNIX
[ https://issues.apache.org/jira/browse/PDFBOX-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998720#comment-13998720 ] Francesca Nina Herpertz commented on PDFBOX-1463: - I think it was resolved together with PDFBOX-1426. I cannot reproduce it with PDFBox 2.0.0. After redeploying the application also PDFs with the fonts described in my previous comment could be rendered. It seems that it was a weblogic caching issue and an old version of the application was still active. I will not open an additional ticket as it seems to be resolved with version 2.0.0. > Unreadable fonts on UNIX > > > Key: PDFBOX-1463 > URL: https://issues.apache.org/jira/browse/PDFBOX-1463 > Project: PDFBox > Issue Type: Bug > Components: Rendering > Environment: UNIX >Reporter: Sindhu N Kashyap >Assignee: Andreas Lehmkühler > Attachments: screenshot-1.jpg > > > I'm converting PDFs to tif. The conversion is fine when run in Windows. When > i run the same code in UNIX ,its converting with a font that is unreadable. I > put some font ttf files in the classes path but that has not made any > difference. Please help. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2073) PDF files with unusual Japanese font can not be rewrite correctly
[ https://issues.apache.org/jira/browse/PDFBOX-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999069#comment-13999069 ] Tilman Hausherr commented on PDFBOX-2073: - This will probably take several months, we usually have about 4 releases per year. https://archive.apache.org/dist/pdfbox/ You can get an intermediate version here: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.6-SNAPSHOT/ > PDF files with unusual Japanese font can not be rewrite correctly > - > > Key: PDFBOX-2073 > URL: https://issues.apache.org/jira/browse/PDFBOX-2073 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.5, 1.8.6, 2.0.0 > Environment: Windows 7 32bit >Reporter: May Yu >Assignee: Tilman Hausherr >Priority: Critical > Labels: encoding > Fix For: 1.8.6, 2.0.0 > > Attachments: font_screenshot1.png, landscape.pdf, pdf_property.png > > > While rotate attached pdf file, The Japanese characters cannot display in the > output pdf file. > This problem can also occur when marge PDF files. > We suspect that this caused by the name of font type. > Environment > - > OS: Windows 7 (32bit) > jvm : 1.6 > pdfbox: 1.8.5 > - > Code to reproduce the problem > - > public static void main(String[] args) { > String filePath = "D:\\test\\landscape.pdf"; > String newPDFFile = "D:\\test\\new_landscape.pdf"; > try { > PDDocument rotatedDocument = PDDocument.load(filePath); > PDDocument document = new PDDocument(); > int pageNumber = document.getNumberOfPages(); > for (int i=0; i PDPage page = > (PDPage)document.getDocumentCatalog().getAllPages().get(i); > page.setRotation(-90); > rotatedDocument.addPage(page); > } > rotatedDocument.save(newPDFFile); > document.close(); > rotatedDocument.close(); > } catch (Exception e) { > e.printStackTrace(); > } > } > - -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (PDFBOX-958) convertToImage mangles images which were in the PDF
[ https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler closed PDFBOX-958. - Resolution: Fixed Reopened to replace a missing attachment > convertToImage mangles images which were in the PDF > --- > > Key: PDFBOX-958 > URL: https://issues.apache.org/jira/browse/PDFBOX-958 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.2.1, 1.4.0, 1.5.0 > Environment: RHEL5 and WinXP, java version "1.6.0_23" >Reporter: Eric Schwarzenbach >Assignee: Andreas Lehmkühler >Priority: Critical > Fix For: 1.6.0 > > Attachments: Image of Page 13.jpeg, Image of Page 13.png, > PDFBOX958-WrycanLoremIpsumTest.pdf > > > Of the PDFs we've tried running through PDFBox and generating page images, a > number of them (coming from disparate sources and method of creation) seem to > produce images where an image that was embedded in the page of the PDF shows > somewhat mangled. It seems to be divided by horizontal stripes, where some > stripes look normal, others seem to have some kind of "smearing" effect going > on. See attached images and original PDF (image is of page 13). > I marked this as critical as we are trying to use PDFBox in a project where > page images are crucial, and inability to produce reasonable looking page > images is pretty much a deal breaker. > The code we use to extract the images looks more or less like the following: > BufferedImage image = > page.convertToImage(); > > SmartDeferredFileOutputStream outStream > = new SmartDeferredFileOutputStream(); > String[] writerFormatNames = > ImageIO.getWriterFormatNames(); > ImageIO.write(image, "jpeg", outStream); > outStream.close() > We've also tried specifying "png". In both "jpg" and "png" cases we get an > image file that is indeed the correct format, and both images look exactly > the same. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2080) Barcode getting color inverted in pdf to image conversion
[ https://issues.apache.org/jira/browse/PDFBOX-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] proba updated PDFBOX-2080: -- Attachment: slika2_3.jpg > Barcode getting color inverted in pdf to image conversion > - > > Key: PDFBOX-2080 > URL: https://issues.apache.org/jira/browse/PDFBOX-2080 > Project: PDFBox > Issue Type: Bug >Reporter: proba > Attachments: FPR0T9.pdf, slika2_3.jpg > > > While converting a 1 page pdf to an image (both attached below), the image > converts properly, however the barcodes colours invert. > The code used to do the conversion looks like this right now: > public static void convertPDFToJPG(String src){ > try{ > //load pdf file in the document object > PDDocument doc=PDDocument.load(new FileInputStream(src)); > //Get all pages from document and store them in a list > List pages=doc.getDocumentCatalog().getAllPages(); > //create iterator object so it is easy to access each page > from the list > Iterator i= pages.iterator(); > int count=1; //count variable used to separate each image > file > //Convert every page of the pdf document to a unique image > file > System.out.println("Please wait..."); > while(i.hasNext()){ > PDPage page=i.next(); > BufferedImage bi=page.convertToImage( > BufferedImage.TYPE_INT_RGB, 300); > FileOutputStream fos = new FileOutputStream(new > File("d:\\slika2_3.jpg")); > //ImageIO.write(bi, "jpg", new > File("d:\\pdfimageold.jpg")); > boolean foundWriter = ImageIOUtil.writeImage(bi, > "jpg", fos, 300); > count++; > > } > System.out.println("Conversion complete"); > }catch(IOException ie){ie.printStackTrace();} > } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1895) Modifying a damaged PDF damages it further
[ https://issues.apache.org/jira/browse/PDFBOX-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993402#comment-13993402 ] Pat Hickey edited comment on PDFBOX-1895 at 5/9/14 5:01 AM: I finally found the missing object. It is the encryption object. I have pasted its content below. The /U token is only 16 bytes long... doesn't the spec say it should be 32? {{ 270 0 obj << /CF << /StdCF << /AuthEvent /DocOpen /CFM /V2 /Length 16 >> >> /EncryptMetadata false /Filter /Standard /Length 128 /O <1C05A048615171E5D46A21726E33D63AB2FFD258E5D9745CC19FAFD8CBC8B086> /P -3900 /R 4 /StmF /StdCF /StrF /StdCF /U <568E89D6FDE15C453FCD04E69160C5BD> /V 4 >> endobj }} was (Author: brzrkr): I finally found the missing object. It is the encryption object. I have pasted its content below. The /U token is only 16 bytes long... doesn't the spec say it should be 32? 270 0 obj << /CF << /StdCF << /AuthEvent /DocOpen /CFM /V2 /Length 16 >> >> /EncryptMetadata false /Filter /Standard /Length 128 /O <1C05A048615171E5D46A21726E33D63AB2FFD258E5D9745CC19FAFD8CBC8B086> /P -3900 /R 4 /StmF /StdCF /StrF /StdCF /U <568E89D6FDE15C453FCD04E69160C5BD> /V 4 >> endobj > Modifying a damaged PDF damages it further > -- > > Key: PDFBOX-1895 > URL: https://issues.apache.org/jira/browse/PDFBOX-1895 > Project: PDFBox > Issue Type: Bug > Components: Writing >Affects Versions: 1.8.3, 1.8.4 >Reporter: Pat Hickey > > When re-writing a document with font descriptions, Adobe Reader is unable to > display the fonts in the document. Reader can display the fonts in the > original document. The difference is that in the original document, the font > descriptions are in lower object numbers than the font references; in the > output document, the font descriptions are in higher object numbers than the > font references. Is there a quick way to re-order them? > Update: the PDF file in question is actually corrupt, but somehow modifying > it with PDFBox causes it to no longer be readable with Adobe Reader. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-958) convertToImage mangles images which were in the PDF
[ https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998738#comment-13998738 ] Andreas Lehmkühler commented on PDFBOX-958: --- Hi Tilman, I've sent the pdf via pm. BR Andreas Lehmkühler > convertToImage mangles images which were in the PDF > --- > > Key: PDFBOX-958 > URL: https://issues.apache.org/jira/browse/PDFBOX-958 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.2.1, 1.4.0, 1.5.0 > Environment: RHEL5 and WinXP, java version "1.6.0_23" >Reporter: Eric Schwarzenbach >Assignee: Andreas Lehmkühler >Priority: Critical > Fix For: 1.6.0 > > Attachments: Image of Page 13.jpeg, Image of Page 13.png, Wrycan® > Lorem Ipsum Test.pdf > > > Of the PDFs we've tried running through PDFBox and generating page images, a > number of them (coming from disparate sources and method of creation) seem to > produce images where an image that was embedded in the page of the PDF shows > somewhat mangled. It seems to be divided by horizontal stripes, where some > stripes look normal, others seem to have some kind of "smearing" effect going > on. See attached images and original PDF (image is of page 13). > I marked this as critical as we are trying to use PDFBox in a project where > page images are crucial, and inability to produce reasonable looking page > images is pretty much a deal breaker. > The code we use to extract the images looks more or less like the following: > BufferedImage image = > page.convertToImage(); > > SmartDeferredFileOutputStream outStream > = new SmartDeferredFileOutputStream(); > String[] writerFormatNames = > ImageIO.getWriterFormatNames(); > ImageIO.write(image, "jpeg", outStream); > outStream.close() > We've also tried specifying "png". In both "jpg" and "png" cases we get an > image file that is indeed the correct format, and both images look exactly > the same. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2079) Extra new line characters extracted in 1.8.5 for embedded files leading to ZipFile exception in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2079: Affects Version/s: 2.0.0 1.8.6 > Extra new line characters extracted in 1.8.5 for embedded files leading to > ZipFile exception in Java 1.6 > > > Key: PDFBOX-2079 > URL: https://issues.apache.org/jira/browse/PDFBOX-2079 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 1.8.5, 1.8.6, 2.0.0 >Reporter: Tim Allison >Assignee: Tilman Hausherr >Priority: Minor > Attachments: PDFBOX-2079-TEST_CASE.patch, embedded_zip.pdf > > > For the test file I'll attach shortly, PDFBox 1.8.4 extracts 17660 bytes from > an embedded zip (well, docx) file. PDFBox 1.8.5 extracts 17662 bytes -- > "\r\n" at the end of the stream. This leads to a ZipException for ZipFile(s) > in Java 1.6, but not Java 1.7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (PDFBOX-958) convertToImage mangles images which were in the PDF
[ https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reopened PDFBOX-958: --- > convertToImage mangles images which were in the PDF > --- > > Key: PDFBOX-958 > URL: https://issues.apache.org/jira/browse/PDFBOX-958 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.2.1, 1.4.0, 1.5.0 > Environment: RHEL5 and WinXP, java version "1.6.0_23" >Reporter: Eric Schwarzenbach >Assignee: Andreas Lehmkühler >Priority: Critical > Fix For: 1.6.0 > > Attachments: Image of Page 13.jpeg, Image of Page 13.png, > PDFBOX958-WrycanLoremIpsumTest.pdf > > > Of the PDFs we've tried running through PDFBox and generating page images, a > number of them (coming from disparate sources and method of creation) seem to > produce images where an image that was embedded in the page of the PDF shows > somewhat mangled. It seems to be divided by horizontal stripes, where some > stripes look normal, others seem to have some kind of "smearing" effect going > on. See attached images and original PDF (image is of page 13). > I marked this as critical as we are trying to use PDFBox in a project where > page images are crucial, and inability to produce reasonable looking page > images is pretty much a deal breaker. > The code we use to extract the images looks more or less like the following: > BufferedImage image = > page.convertToImage(); > > SmartDeferredFileOutputStream outStream > = new SmartDeferredFileOutputStream(); > String[] writerFormatNames = > ImageIO.getWriterFormatNames(); > ImageIO.write(image, "jpeg", outStream); > outStream.close() > We've also tried specifying "png". In both "jpg" and "png" cases we get an > image file that is indeed the correct format, and both images look exactly > the same. -- This message was sent by Atlassian JIRA (v6.2#6252)