[jira] [Commented] (PDFBOX-2094) Add PrintRequestAttributeSet parameter to silentPrint()
[ https://issues.apache.org/jira/browse/PDFBOX-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013334#comment-14013334 ] senthuran commented on PDFBOX-2094: --- Thanks John and Tilman . I'm using the PDFBox snapshots for my implementation. I have use AWT print() method to print PDF file with PDFBox snapshot version pdfbox-app-2.0.0-20140509.193750-276.jar. PDFBox has been removed PDPageable class after snapshot version pdfbox-app-2.0.0-20140509.193750-277.jar and add a new class PDFPrinter. PDFPrinter class implement getPageable() method and silentPrint() method. If i use PDFBox silentPrint() method to print a PDF file i can able to print a pdf file. But I'm unable to set page range (E.g first page to 3rd page) need print, which tray need to get paper from printer (E.g TOP tray). But AWT print() method allows the user to set the printRequestAttributeSet in print() method. So users can able to set the printer(HW) and PDF file related attribute via printRequestAttributeSet. If PDFBox also allows users to set the printRequestAttributeSet in silentPrint() method. It'll be more helpful for a user. Add PrintRequestAttributeSet parameter to silentPrint() --- Key: PDFBOX-2094 URL: https://issues.apache.org/jira/browse/PDFBOX-2094 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: senthuran Assignee: John Hewson Priority: Minor The current implementation is not allow us to set the printer , paper Attribute. Could you please implement the silentPrint() to accept printRequestAttributeSet as parameter. affected version from pdfbox-app-2.0.0-20140506.050443-277jar to pdfbox-app-2.0.0-20140506.050443-301jar . -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Enhancements to PDFBox
Am 29.05.2014 um 18:51 schrieb John Hewson j...@jahewson.com: # splitting files (e.g. remove no longer needed resources) Each page has its own Resources dictionary, so it shouldn't be too difficult. One thing to watch out for is is the page tree which allows pages to inherit resources from each other, this is handled as PDPageNode but it's kind of messy. thanks for the hint. Splitting and merging is somewhat similar as splitting is typically done by creating a new document and importing the needed pages into the newly created document. Using the current code this might lead to duplicate resources. # merging files (e.g. avoid duplicating resources) Sounds like the files are pretty similar, is this actually an overlay? Or are you wanting to insert entire pages? it’s merging individual files together inserting entire pages. Although the files are created individually they share some common elements like company logos or fonts. I imagine you probably want to implement both these features at the COS level rather than the PD level, as it's pretty low-level processing. It will involve a lot of COS processing. I haven’t decided yet if it will sit on top of COS or PD. Typically we do encourage people to use PD so I tend to start from there and dig down internally as needed. WDYT? -- John On 29 May 2014, at 00:39, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi, for a current project I need to work on enhancing PDFBox for # splitting files (e.g. remove no longer needed resources) # merging files (e.g. avoid duplicating resources) # page handling (adding/removing individual pages with resource handling) # enhancements to forms handling (pre fill XFA forms - partially done, enhancing AP generation) Is someone else working on something similar? BR Maruan
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013390#comment-14013390 ] Tilman Hausherr commented on PDFBOX-1915: - I just tried, I can't assign it to you, apparently this is only possible for committers. But all committers know that it is yours :-) Was any of the test files in a different direction that the others? I ask because one of your drawings showed clockwise and counterclockwise. Could you please edit your earlier comments to join the lines that are broken? Also, which strategy was the successful one? I assume it is the last one (i.e. the one in the first comment). I'd say that was a very successful week. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Labels: graphical, gsoc2014, java, math, shading Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and optimize the complete shading package for speed; implement cubic spline interpolation for type 0 (sampled) functions (that one is really low-low priority, see details by looking up cubic spline interpolation in the PDF spec, which tells that it is disregarded in printing, and I don't have a test PDF). Mentor: Tilman Hausherr (European
[jira] [Comment Edited] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013390#comment-14013390 ] Tilman Hausherr edited comment on PDFBOX-1915 at 5/30/14 7:45 AM: -- I just tried, I can't assign it to you, apparently this is only possible to committers. But all committers know that it is yours :-) Was any of the test files in a different direction that the others? I ask because one early hand drawing showed clockwise and counterclockwise. Could you please edit your earlier comments to join the lines that are broken? Also, which strategy was the successful one? I assume it is the last one (i.e. the one in the first comment). I'd say that was a very successful week! was (Author: tilman): I just tried, I can't assign it to you, apparently this is only possible for committers. But all committers know that it is yours :-) Was any of the test files in a different direction that the others? I ask because one of your drawings showed clockwise and counterclockwise. Could you please edit your earlier comments to join the lines that are broken? Also, which strategy was the successful one? I assume it is the last one (i.e. the one in the first comment). I'd say that was a very successful week. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Labels: graphical, gsoc2014, java, math, shading Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several
[jira] [Comment Edited] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008956#comment-14008956 ] Tilman Hausherr edited comment on PDFBOX-1915 at 5/30/14 8:24 AM: -- Two examples of a coons patch in PostScript. I found the first one on the usenet (messageid [20021112083949.7bf2c92e.rs...@sympatico.ca|https://groups.google.com/forum/#!original/comp.lang.postscript/DXygltnXHi4/8kVTr13W0xEJ] posted by Robert Swan in 2002), the second one is modified from the first. When converting to PDF with ghostscript, set version 1.5 in the options, to avoid it being converted into an image. was (Author: tilman): Two examples of a coons patch in postscript. I found the first one on the usenet, the second one is modified from the first. When converting to PDF, set version 1.5 in the options, to avoid it being converted into an image. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Labels: graphical, gsoc2014, java, math, shading Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and optimize the complete shading package for speed; implement cubic spline interpolation for type 0 (sampled) functions (that one is
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013425#comment-14013425 ] Shaola Ren commented on PDFBOX-1915: For the assign, it's fine, I was just curious about this item, and I think it's no harm to ask you this question, so I did :) For the current method I used, I don't need to consider the counterclockwise and clockwise direction, as the grid is generated automatically, all situations and priority rules are followed directly. I almost deleted all the code in CoonsPatch class and CubicBezierCurve class I wrote before May 28 and rewrite this new version by adding another class CoonsTriangle. Obviously, there is some redundant code there, I will edit this stuff last. Although, the previous version is hardly used in the current code, the previous version helped me a lot to understand the whole problem. Yes, the last strategy works, first dividing a patch to small 4-side patches, then dividing each small patch to two triangles, then create a triangle list as shading type 5, but having difference with what you coded in shading type 5, I will write a detailed document about this method later. For the broken line you mentioned, I looked at that, that is in my first comment in this thread, they are not broken lines, just with arrows, one arrow followed by a whole paragraph, no content missed. Yes, I am happy with this progress. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Labels: graphical, gsoc2014, java, math, shading Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the
[jira] [Assigned] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reassigned PDFBOX-1915: -- Assignee: Shaola Ren I've added Shaola to the contributors group and assigning shouldn't be a problem anymore Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and optimize the complete shading package for speed; implement cubic spline interpolation for type 0 (sampled) functions (that one is really low-low priority, see details by looking up cubic spline interpolation in the PDF spec, which tells that it is disregarded in printing, and I don't have a test PDF). Mentor: Tilman Hausherr (European timezone, languages: german, english, french) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1915: Fix Version/s: 2.0.0 Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and optimize the complete shading package for speed; implement cubic spline interpolation for type 0 (sampled) functions (that one is really low-low priority, see details by looking up cubic spline interpolation in the PDF spec, which tells that it is disregarded in printing, and I don't have a test PDF). Mentor: Tilman Hausherr (European timezone, languages: german, english, french) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1915: Affects Version/s: 1.8.6 1.8.5 Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and optimize the complete shading package for speed; implement cubic spline interpolation for type 0 (sampled) functions (that one is really low-low priority, see details by looking up cubic spline interpolation in the PDF spec, which tells that it is disregarded in printing, and I don't have a test PDF). Mentor: Tilman Hausherr (European timezone, languages: german, english, french) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013432#comment-14013432 ] Shaola Ren commented on PDFBOX-1915: Thank Andreas Lehmkühler :) Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and optimize the complete shading package for speed; implement cubic spline interpolation for type 0 (sampled) functions (that one is really low-low priority, see details by looking up cubic spline interpolation in the PDF spec, which tells that it is disregarded in printing, and I don't have a test PDF). Mentor: Tilman Hausherr (European timezone, languages: german, english, french) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2103) JPXFilter fails to decode some Jpeg2000 images
[ https://issues.apache.org/jira/browse/PDFBOX-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2103: --- Attachment: JPXFilter.java.patch 01_MTEXT_CS6.pdf JPXFilter fails to decode some Jpeg2000 images -- Key: PDFBOX-2103 URL: https://issues.apache.org/jira/browse/PDFBOX-2103 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, JPXFilter.java.patch Most of the images in the attached PDF are missing when rendered via PDFBox (tested in 2.0 head). The reason is a NullPointerException in ImageIO: java.lang.NullPointerException at com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.replace(J2KMetadata.java:962) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.addNode(J2KMetadata.java:631) at jj2000.j2k.fileformat.reader.FileFormatReader.readFileFormat(FileFormatReader.java:279) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.initializeRead(J2KReadState.java:418) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.init(J2KReadState.java:189) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReader.read(J2KImageReader.java:443) at javax.imageio.ImageReader.read(Unknown Source) at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:84) at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:58) ... To avoid the problem, the ImageIO has to be instructed to skip reading metadata of the image, i.e. use reader.setInput(iis, true, true) instead of reader.setInput(iis) as shown in the attached patch. This is also what ImageIO.read(stream) does - the method that was used before the commit 1570806. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2103) JPXFilter fails to decode some Jpeg2000 images
Petr Slaby created PDFBOX-2103: -- Summary: JPXFilter fails to decode some Jpeg2000 images Key: PDFBOX-2103 URL: https://issues.apache.org/jira/browse/PDFBOX-2103 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, JPXFilter.java.patch Most of the images in the attached PDF are missing when rendered via PDFBox (tested in 2.0 head). The reason is a NullPointerException in ImageIO: java.lang.NullPointerException at com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.replace(J2KMetadata.java:962) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.addNode(J2KMetadata.java:631) at jj2000.j2k.fileformat.reader.FileFormatReader.readFileFormat(FileFormatReader.java:279) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.initializeRead(J2KReadState.java:418) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.init(J2KReadState.java:189) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReader.read(J2KImageReader.java:443) at javax.imageio.ImageReader.read(Unknown Source) at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:84) at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:58) ... To avoid the problem, the ImageIO has to be instructed to skip reading metadata of the image, i.e. use reader.setInput(iis, true, true) instead of reader.setInput(iis) as shown in the attached patch. This is also what ImageIO.read(stream) does - the method that was used before the commit 1570806. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2104) Implement transparency groups
Petr Slaby created PDFBOX-2104: -- Summary: Implement transparency groups Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2104: --- Attachment: 01_MTEXT_CS6.pdf TransparencyGroups.patch Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013495#comment-14013495 ] Andreas Lehmkühler edited comment on PDFBOX-2104 at 5/30/14 10:41 AM: -- Great, that's a most welcome change of course! So, first of all thanks for that! But there is one issue we have to solve first. Your patch is a substantial change for our codebase and requires - that all authors sign an iCLA and the company a CCLA - or that the company which donates the code signs a software grant IMHO in your case it's the latter. [Link|http://www.apache.org/licenses/] provides the details. If there are any questions, please adress those to the mailing list dev@pdfbox was (Author: lehmi): Great, that's a most welcome change of course! So, first of all thanks for that! But there is one issue we have to solve first. Your patch is a substantial change for our codebase and requires - that all authors sign an iCLA and the company a CCLA - or that the company which donates the code signs a software grant IMHO in your case it's the latter. [1] provides the details. If there are any questions, please adress those to the mailing list dev@pdfbox Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013597#comment-14013597 ] Petr Slaby commented on PDFBOX-2104: I do not think a software grant would be applicable, I just want to contribute a few patches and improvements to pdfbox. I have forwarded the request to sign the CCLA to our decision makers and legal owners of my works. Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013628#comment-14013628 ] Andreas Lehmkühler commented on PDFBOX-2104: Maybe that's the better choice, especially if we are talking about more contributions in the future. But you as an individual have to sign a iCLA as well the iCLA. Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2101) Surprising memory consumption when extracting images
[ https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013797#comment-14013797 ] Andreas Lehmkühler commented on PDFBOX-2101: I've added a clear() method to PDFont and PDXObject to delete cached resources if necessary in revisions 1598627 (trunk) and 1598633 (1.8 branch). Those methods are called when clearing PDResources. PDFont.clear is still empty but I'm going to fill in some stuff soon. Surprising memory consumption when extracting images Key: PDFBOX-2101 URL: https://issues.apache.org/jira/browse/PDFBOX-2101 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.5 Environment: Windows 7 java version 1.7.0_55 Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, PDFBOX-2101-714-poor.jpg, java.hprof.zip ExtractImages seems to fail to release memory resources on some files in both PDFBox 1.8.5 and trunk. On this file 4MB file [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if extracting every image on every page (and there are many, many duplicate images), there is an OOM with -Xmx1g. If there is no Xmx and there is 2.5g available, ExtractImages will work. With some experimentation, the triggers seem to be JPEG images that have masks. I'm not sure, though, whether the issue is with PDFBox or Java. Commandlines: 1.8.5: java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 239665.pdf 2.0_SNAPSHOT: java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf Results: 1.8.5: 906 files before OOM {noformat} Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja va:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java: 514) at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDP ixelMap.java:217) at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStr eam(PDPixelMap.java:363) at org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file( PDXObjectImage.java:254) at org.apache.pdfbox.ExtractImages.processResources(ExtractImages.java:2 02) at org.apache.pdfbox.ExtractImages.extractImages(ExtractImages.java:160) at org.apache.pdfbox.ExtractImages.main(ExtractImages.java:65) {noformat} 2.0_SNAPSHOT: 428 files before OOM {noformat} Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja va:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:70) at org.apache.pdfbox.io.IOUtils.toByteArray(IOUtils.java:52) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit( SampledImageReader.java:171) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBIma ge(SampledImageReader.java:154) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDIm ageXObject.java:171) at org.apache.pdfbox.tools.ExtractImages.write2file(ExtractImages.java:2 31) at org.apache.pdfbox.tools.ExtractImages.processResources(ExtractImages. java:206) at org.apache.pdfbox.tools.ExtractImages.extractImages(ExtractImages.jav a:164) at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:69) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PDFBOX-2103) JPXFilter fails to decode some Jpeg2000 images
[ https://issues.apache.org/jira/browse/PDFBOX-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-2103. Resolution: Fixed Fix Version/s: 2.0.0 Assignee: Andreas Lehmkühler I've added the patch as proposed in revision 1598642. Thanks for the contribution! JPXFilter fails to decode some Jpeg2000 images -- Key: PDFBOX-2103 URL: https://issues.apache.org/jira/browse/PDFBOX-2103 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Assignee: Andreas Lehmkühler Fix For: 2.0.0 Attachments: 01_MTEXT_CS6.pdf, JPXFilter.java.patch Most of the images in the attached PDF are missing when rendered via PDFBox (tested in 2.0 head). The reason is a NullPointerException in ImageIO: java.lang.NullPointerException at com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.replace(J2KMetadata.java:962) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.addNode(J2KMetadata.java:631) at jj2000.j2k.fileformat.reader.FileFormatReader.readFileFormat(FileFormatReader.java:279) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.initializeRead(J2KReadState.java:418) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.init(J2KReadState.java:189) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReader.read(J2KImageReader.java:443) at javax.imageio.ImageReader.read(Unknown Source) at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:84) at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:58) ... To avoid the problem, the ImageIO has to be instructed to skip reading metadata of the image, i.e. use reader.setInput(iis, true, true) instead of reader.setInput(iis) as shown in the attached patch. This is also what ImageIO.read(stream) does - the method that was used before the commit 1570806. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2101) Surprising memory consumption when extracting images
[ https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013904#comment-14013904 ] Andreas Lehmkühler commented on PDFBOX-2101: I've implemented clear() for some of the classes inherited from PDFont in revisions 1598655 (trunk) and 1598657 (1.8 branch). This should lead to a smaller memory foot print as some objects could be released earlier Surprising memory consumption when extracting images Key: PDFBOX-2101 URL: https://issues.apache.org/jira/browse/PDFBOX-2101 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.5 Environment: Windows 7 java version 1.7.0_55 Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, PDFBOX-2101-714-poor.jpg, java.hprof.zip ExtractImages seems to fail to release memory resources on some files in both PDFBox 1.8.5 and trunk. On this file 4MB file [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if extracting every image on every page (and there are many, many duplicate images), there is an OOM with -Xmx1g. If there is no Xmx and there is 2.5g available, ExtractImages will work. With some experimentation, the triggers seem to be JPEG images that have masks. I'm not sure, though, whether the issue is with PDFBox or Java. Commandlines: 1.8.5: java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 239665.pdf 2.0_SNAPSHOT: java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf Results: 1.8.5: 906 files before OOM {noformat} Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja va:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java: 514) at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDP ixelMap.java:217) at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStr eam(PDPixelMap.java:363) at org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file( PDXObjectImage.java:254) at org.apache.pdfbox.ExtractImages.processResources(ExtractImages.java:2 02) at org.apache.pdfbox.ExtractImages.extractImages(ExtractImages.java:160) at org.apache.pdfbox.ExtractImages.main(ExtractImages.java:65) {noformat} 2.0_SNAPSHOT: 428 files before OOM {noformat} Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja va:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:70) at org.apache.pdfbox.io.IOUtils.toByteArray(IOUtils.java:52) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit( SampledImageReader.java:171) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBIma ge(SampledImageReader.java:154) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDIm ageXObject.java:171) at org.apache.pdfbox.tools.ExtractImages.write2file(ExtractImages.java:2 31) at org.apache.pdfbox.tools.ExtractImages.processResources(ExtractImages. java:206) at org.apache.pdfbox.tools.ExtractImages.extractImages(ExtractImages.jav a:164) at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:69) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2103) JPXFilter fails to decode some Jpeg2000 images
[ https://issues.apache.org/jira/browse/PDFBOX-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013907#comment-14013907 ] Tilman Hausherr commented on PDFBOX-2103: - Just for the record, the described NPE didn't happen for me. Maybe it depends on what JAI version is used. Anyway, it means we have yet another interesting test PDF :-) JPXFilter fails to decode some Jpeg2000 images -- Key: PDFBOX-2103 URL: https://issues.apache.org/jira/browse/PDFBOX-2103 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Assignee: Andreas Lehmkühler Fix For: 2.0.0 Attachments: 01_MTEXT_CS6.pdf, JPXFilter.java.patch Most of the images in the attached PDF are missing when rendered via PDFBox (tested in 2.0 head). The reason is a NullPointerException in ImageIO: java.lang.NullPointerException at com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.replace(J2KMetadata.java:962) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.addNode(J2KMetadata.java:631) at jj2000.j2k.fileformat.reader.FileFormatReader.readFileFormat(FileFormatReader.java:279) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.initializeRead(J2KReadState.java:418) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.init(J2KReadState.java:189) at com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReader.read(J2KImageReader.java:443) at javax.imageio.ImageReader.read(Unknown Source) at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:84) at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:58) ... To avoid the problem, the ImageIO has to be instructed to skip reading metadata of the image, i.e. use reader.setInput(iis, true, true) instead of reader.setInput(iis) as shown in the attached patch. This is also what ImageIO.read(stream) does - the method that was used before the commit 1570806. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013961#comment-14013961 ] John Hewson commented on PDFBOX-2104: - I'm not sure about your ColorSpaceDeviceGray class, we used to use subclasses of AWT color spaces like this but removed them due to poor performance. There shouldn't be any need for color conversion in 2.0 as everything is RGB internally, perhaps you can remove this along with the CIE-XYZ handling? Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013961#comment-14013961 ] John Hewson edited comment on PDFBOX-2104 at 5/30/14 5:15 PM: -- I'm not sure about your ColorSpaceDeviceGray class, we used to use subclasses of AWT color spaces like this but removed them due to poor performance. There shouldn't be any need for color conversion in 2.0 as everything is RGB internally (which wasn't the case with 1.7), perhaps you can remove this along with the CIE-XYZ handling? was (Author: jahewson): I'm not sure about your ColorSpaceDeviceGray class, we used to use subclasses of AWT color spaces like this but removed them due to poor performance. There shouldn't be any need for color conversion in 2.0 as everything is RGB internally, perhaps you can remove this along with the CIE-XYZ handling? Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Enhancements to PDFBox
It will involve a lot of COS processing. I haven’t decided yet if it will sit on top of COS or PD. Typically we do encourage people to use PD so I tend to start from there and dig down internally as needed. WDYT? Starting with PD and using COS where needed sounds reasonable. Ultimately you don’t need a high-level API to do the manipulations which you’re interested in, so COS should suffice, but PD might be quicker to get started with. -- John On 29 May 2014, at 23:25, Maruan Sahyoun sahy...@fileaffairs.de wrote: Am 29.05.2014 um 18:51 schrieb John Hewson j...@jahewson.com: # splitting files (e.g. remove no longer needed resources) Each page has its own Resources dictionary, so it shouldn't be too difficult. One thing to watch out for is is the page tree which allows pages to inherit resources from each other, this is handled as PDPageNode but it's kind of messy. thanks for the hint. Splitting and merging is somewhat similar as splitting is typically done by creating a new document and importing the needed pages into the newly created document. Using the current code this might lead to duplicate resources. # merging files (e.g. avoid duplicating resources) Sounds like the files are pretty similar, is this actually an overlay? Or are you wanting to insert entire pages? it’s merging individual files together inserting entire pages. Although the files are created individually they share some common elements like company logos or fonts. I imagine you probably want to implement both these features at the COS level rather than the PD level, as it's pretty low-level processing. It will involve a lot of COS processing. I haven’t decided yet if it will sit on top of COS or PD. Typically we do encourage people to use PD so I tend to start from there and dig down internally as needed. WDYT? -- John On 29 May 2014, at 00:39, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi, for a current project I need to work on enhancing PDFBox for # splitting files (e.g. remove no longer needed resources) # merging files (e.g. avoid duplicating resources) # page handling (adding/removing individual pages with resource handling) # enhancements to forms handling (pre fill XFA forms - partially done, enhancing AP generation) Is someone else working on something similar? BR Maruan
[jira] [Updated] (PDFBOX-2102) Characters swallowed on COSString.getString()
[ https://issues.apache.org/jira/browse/PDFBOX-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2102: Fix Version/s: 2.0.0 1.8.6 Characters swallowed on COSString.getString() - Key: PDFBOX-2102 URL: https://issues.apache.org/jira/browse/PDFBOX-2102 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Jeremias Maerki Assignee: Jeremias Maerki Fix For: 1.8.6, 2.0.0 PDFBOX-1437 seems to have introduced a regression that causes characters like \n to be swallowed when COSString.getString() is called. PDFDocEncoding doesn't handle all valid characters. {code} testStr = Line1\nLine2\nLine3\n; COSString lineFeedString = new COSString(testStr); assertEquals(testStr, lineFeedString.getString()); //Same as previous but this time as a dictionary value lineFeedString = new COSString(true); for (int i = 0; i testStr.length(); i++) { lineFeedString.append(testStr.charAt(i)); } assertEquals(testStr, lineFeedString.getString()); //currently fails {code} Direct link to the change causing the regression: http://svn.apache.org/viewvc?view=revisionrevision=1406628 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2101) Surprising memory consumption when extracting images
[ https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014083#comment-14014083 ] Tilman Hausherr commented on PDFBOX-2101: - Sorry, but there's a rendering problem with the 2nd page of PDFBOX-2103: {code} Start rendering page 2 30.05.2014 20:39:20.854 WARN [main] org.apache.pdfbox.util.PDFStreamEngine:557 - java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.pdfbox.cos.COSArray.getObject(COSArray.java:188) at org.apache.pdfbox.pdmodel.font.PDType0Font.init(PDType0Font.java:63) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:72) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:209) at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:615) at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:53) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:544) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:264) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:223) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:164) at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:214) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:147) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:96) at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:414) at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:208) 30.05.2014 20:39:20.866 WARN [main] org.apache.pdfbox.util.PDFStreamEngine:356 - java.lang.NullPointerException java.lang.NullPointerException at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:352) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:43) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:544) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:264) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:223) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:164) at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:214) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:147) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:96) at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:414) at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:208) {code} Surprising memory consumption when extracting images Key: PDFBOX-2101 URL: https://issues.apache.org/jira/browse/PDFBOX-2101 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.5 Environment: Windows 7 java version 1.7.0_55 Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, PDFBOX-2101-714-poor.jpg, java.hprof.zip ExtractImages seems to fail to release memory resources on some files in both PDFBox 1.8.5 and trunk. On this file 4MB file [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if extracting every image on every page (and there are many, many duplicate images), there is an OOM with -Xmx1g. If there is no Xmx and there is 2.5g available, ExtractImages will work. With some experimentation, the triggers seem to be JPEG images that have masks. I'm not sure, though, whether the issue is with PDFBox or Java. Commandlines: 1.8.5: java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 239665.pdf 2.0_SNAPSHOT: java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf Results: 1.8.5: 906 files before OOM {noformat} Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
[jira] [Commented] (PDFBOX-2101) Surprising memory consumption when extracting images
[ https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014101#comment-14014101 ] Tilman Hausherr commented on PDFBOX-2101: - The file of PDFBOX-1283 has also a rendering problem. Surprising memory consumption when extracting images Key: PDFBOX-2101 URL: https://issues.apache.org/jira/browse/PDFBOX-2101 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.5 Environment: Windows 7 java version 1.7.0_55 Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) Reporter: Tim Allison Assignee: Andreas Lehmkühler Priority: Minor Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, PDFBOX-2101-714-poor.jpg, java.hprof.zip ExtractImages seems to fail to release memory resources on some files in both PDFBox 1.8.5 and trunk. On this file 4MB file [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if extracting every image on every page (and there are many, many duplicate images), there is an OOM with -Xmx1g. If there is no Xmx and there is 2.5g available, ExtractImages will work. With some experimentation, the triggers seem to be JPEG images that have masks. I'm not sure, though, whether the issue is with PDFBox or Java. Commandlines: 1.8.5: java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 239665.pdf 2.0_SNAPSHOT: java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf Results: 1.8.5: 906 files before OOM {noformat} Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja va:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java: 514) at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDP ixelMap.java:217) at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStr eam(PDPixelMap.java:363) at org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file( PDXObjectImage.java:254) at org.apache.pdfbox.ExtractImages.processResources(ExtractImages.java:2 02) at org.apache.pdfbox.ExtractImages.extractImages(ExtractImages.java:160) at org.apache.pdfbox.ExtractImages.main(ExtractImages.java:65) {noformat} 2.0_SNAPSHOT: 428 files before OOM {noformat} Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja va:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:70) at org.apache.pdfbox.io.IOUtils.toByteArray(IOUtils.java:52) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit( SampledImageReader.java:171) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBIma ge(SampledImageReader.java:154) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDIm ageXObject.java:171) at org.apache.pdfbox.tools.ExtractImages.write2file(ExtractImages.java:2 31) at org.apache.pdfbox.tools.ExtractImages.processResources(ExtractImages. java:206) at org.apache.pdfbox.tools.ExtractImages.extractImages(ExtractImages.jav a:164) at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:69) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Idea: stable 2.0 versions
I think the risk of creating the impression that 2.0 is stable is too high. The real problem is that 2.0 has been too long in development, there were frustrated users asking a year ago about when it would be released. Perhaps it’s time to push for a release of 2.0 and aim for a more frequent release cycle after that, to avoid repeating the situation where the stable and trunk versions are years apart? What is holding back 2.0? What features are we *really* holding out on? Can we put together a roadmap - our users often ask for one... -- John On 30 May 2014, at 14:01, Tilman Hausherr thaush...@t-online.de wrote: I suggest that we come up with a concept of designating stable versions (or tested versions) for the trunk and put them on the homepage. A stable version is one with no or only minor regressions, and/or a version that committers have found to be good. This would be for users of the 2.0 version who don't want to read every discussion, and also as a hint for unhappy 1.8 users. I suspect that other open source projects do also have rules to designate stable versions, but I didn't look at them. Proposed rules: - any committer can designate any version that is older than 24 hours as stable - any committer can veto any version as unstable - any version that has only positive votes is mentioned on https://pdfbox.apache.org/downloads.html#scm - there should be up to three versions there Tilman
[jira] [Commented] (PDFBOX-2102) Characters swallowed on COSString.getString()
[ https://issues.apache.org/jira/browse/PDFBOX-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014295#comment-14014295 ] Petr Slaby commented on PDFBOX-2102: [~jerem...@apache.org]: After the change in 1598316, I am getting IllegalArgumentExceptions on some of the documents in my test suite. The culprit seems to be a missing in.position(in.position() - 1); at the line 141 in SingleByteCharset. You might also consider using something like int mark = src.position(); try { mark++; // in front of out.put() } finally { src.position(mark); } This pattern is used in single byte encoding implementation of OpenJVM. Also, it has a better performing implementation for the case that both the byte and char buffer are based on an array (which is the most usual case). The test document (coming from http://www.stillhq.com/pdfdb/db.html) and stack trace is attached, but the missing call of position() seems to be obvious, anyway. Characters swallowed on COSString.getString() - Key: PDFBOX-2102 URL: https://issues.apache.org/jira/browse/PDFBOX-2102 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Jeremias Maerki Assignee: Jeremias Maerki Fix For: 1.8.6, 2.0.0 PDFBOX-1437 seems to have introduced a regression that causes characters like \n to be swallowed when COSString.getString() is called. PDFDocEncoding doesn't handle all valid characters. {code} testStr = Line1\nLine2\nLine3\n; COSString lineFeedString = new COSString(testStr); assertEquals(testStr, lineFeedString.getString()); //Same as previous but this time as a dictionary value lineFeedString = new COSString(true); for (int i = 0; i testStr.length(); i++) { lineFeedString.append(testStr.charAt(i)); } assertEquals(testStr, lineFeedString.getString()); //currently fails {code} Direct link to the change causing the regression: http://svn.apache.org/viewvc?view=revisionrevision=1406628 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2102) Characters swallowed on COSString.getString()
[ https://issues.apache.org/jira/browse/PDFBOX-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2102: --- Attachment: 59.pdf Characters swallowed on COSString.getString() - Key: PDFBOX-2102 URL: https://issues.apache.org/jira/browse/PDFBOX-2102 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Jeremias Maerki Assignee: Jeremias Maerki Fix For: 1.8.6, 2.0.0 Attachments: 59.pdf, iae.txt PDFBOX-1437 seems to have introduced a regression that causes characters like \n to be swallowed when COSString.getString() is called. PDFDocEncoding doesn't handle all valid characters. {code} testStr = Line1\nLine2\nLine3\n; COSString lineFeedString = new COSString(testStr); assertEquals(testStr, lineFeedString.getString()); //Same as previous but this time as a dictionary value lineFeedString = new COSString(true); for (int i = 0; i testStr.length(); i++) { lineFeedString.append(testStr.charAt(i)); } assertEquals(testStr, lineFeedString.getString()); //currently fails {code} Direct link to the change causing the regression: http://svn.apache.org/viewvc?view=revisionrevision=1406628 -- This message was sent by Atlassian JIRA (v6.2#6252)