date:20140530


[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013390#comment-14013390
 ] 

Tilman Hausherr commented on PDFBOX-1915:
-

I just tried, I can't assign it to you, apparently this is only possible for 
committers. But all committers know that it is yours :-)

Was any of the test files in a different direction that the others? I ask 
because one of your drawings showed clockwise and counterclockwise.

Could you please edit your earlier comments to join the lines that are broken?

Also, which strategy was the successful one? I assume it is the last one (i.e. 
the one in the first comment).

I'd say that was a very successful week.

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
  Labels: graphical, gsoc2014, java, math, shading
 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, 
 eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, 
 patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the debugger with a Gouraud shading, 
 try this file:
 http://asymptote.sourceforge.net/gallery/Gouraud.pdf
 Testing:
 I have attached several example PDFs. To see which one has which shading, 
 open them with an editor like NOTEPAD++, and search for /ShadingType 
 (without the quotes). If your images are rendering like the example PDFs, 
 then you were successful.
 Optional:
 Review and optimize the complete shading package for speed; implement cubic 
 spline interpolation for type 0 (sampled) functions (that one is really 
 low-low priority, see details by looking up cubic spline interpolation in 
 the PDF spec, which tells that it is disregarded in printing, and I don't 
 have a test PDF).
 Mentor: Tilman Hausherr (European

[jira] [Comment Edited] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes


[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013390#comment-14013390
 ] 

Tilman Hausherr edited comment on PDFBOX-1915 at 5/30/14 7:45 AM:
--

I just tried, I can't assign it to you, apparently this is only possible to 
committers. But all committers know that it is yours :-)

Was any of the test files in a different direction that the others? I ask 
because one early hand drawing showed clockwise and counterclockwise.

Could you please edit your earlier comments to join the lines that are broken?

Also, which strategy was the successful one? I assume it is the last one (i.e. 
the one in the first comment).

I'd say that was a very successful week!


was (Author: tilman):
I just tried, I can't assign it to you, apparently this is only possible for 
committers. But all committers know that it is yours :-)

Was any of the test files in a different direction that the others? I ask 
because one of your drawings showed clockwise and counterclockwise.

Could you please edit your earlier comments to join the lines that are broken?

Also, which strategy was the successful one? I assume it is the last one (i.e. 
the one in the first comment).

I'd say that was a very successful week.

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
  Labels: graphical, gsoc2014, java, math, shading
 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, 
 eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, 
 patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the debugger with a Gouraud shading, 
 try this file:
 http://asymptote.sourceforge.net/gallery/Gouraud.pdf
 Testing:
 I have attached several

[jira] [Comment Edited] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes


[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008956#comment-14008956
 ] 

Tilman Hausherr edited comment on PDFBOX-1915 at 5/30/14 8:24 AM:
--

Two examples of a coons patch in PostScript. I found the first one on the 
usenet (messageid 
[20021112083949.7bf2c92e.rs...@sympatico.ca|https://groups.google.com/forum/#!original/comp.lang.postscript/DXygltnXHi4/8kVTr13W0xEJ]
 posted by Robert Swan in 2002), the second one is modified from the first. 
When converting to PDF with ghostscript, set version 1.5 in the options, to 
avoid it being converted into an image.


was (Author: tilman):
Two examples of a coons patch in postscript. I found the first one on the 
usenet, the second one is modified from the first. When converting to PDF, set 
version 1.5 in the options, to avoid it being converted into an image.

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
  Labels: graphical, gsoc2014, java, math, shading
 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, 
 eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, 
 patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the debugger with a Gouraud shading, 
 try this file:
 http://asymptote.sourceforge.net/gallery/Gouraud.pdf
 Testing:
 I have attached several example PDFs. To see which one has which shading, 
 open them with an editor like NOTEPAD++, and search for /ShadingType 
 (without the quotes). If your images are rendering like the example PDFs, 
 then you were successful.
 Optional:
 Review and optimize the complete shading package for speed; implement cubic 
 spline interpolation for type 0 (sampled) functions (that one is

[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes

2014-05-30 Thread Shaola Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013425#comment-14013425
 ] 

Shaola Ren commented on PDFBOX-1915:


For the assign, it's fine,  I was just curious about this item, and I think 
it's no harm to ask you this question, so I did :)

For the current method I used, I don't need to consider the counterclockwise 
and clockwise direction, as the grid is generated automatically, all situations 
and priority rules are followed directly. I almost deleted all the code in 
CoonsPatch class and CubicBezierCurve class I wrote before May 28 and rewrite 
this new version by adding another class CoonsTriangle. Obviously, there is 
some redundant code there, I will edit this stuff last.

Although, the previous version is hardly used in the current code, the previous 
version helped me a lot to understand the whole problem.

Yes, the last strategy works, first dividing a patch to small 4-side patches, 
then dividing each small patch to two triangles, then create a triangle list as 
shading type 5, but having difference with what you coded in shading type 5, I 
will write a detailed document about this method later.

For the broken line you mentioned, I looked at that, that is in my first 
comment in this thread, they are not broken lines, just with arrows, one arrow 
followed by a whole paragraph, no content missed.

Yes, I am happy with this progress.

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
  Labels: graphical, gsoc2014, java, math, shading
 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, 
 eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, 
 patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the

[jira] [Assigned] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-1915:
--

Assignee: Shaola Ren

I've added Shaola to the contributors group and assigning shouldn't be a 
problem anymore

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, 
 eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, 
 patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the debugger with a Gouraud shading, 
 try this file:
 http://asymptote.sourceforge.net/gallery/Gouraud.pdf
 Testing:
 I have attached several example PDFs. To see which one has which shading, 
 open them with an editor like NOTEPAD++, and search for /ShadingType 
 (without the quotes). If your images are rendering like the example PDFs, 
 then you were successful.
 Optional:
 Review and optimize the complete shading package for speed; implement cubic 
 spline interpolation for type 0 (sampled) functions (that one is really 
 low-low priority, see details by looking up cubic spline interpolation in 
 the PDF spec, which tells that it is disregarded in printing, and I don't 
 have a test PDF).
 Mentor: Tilman Hausherr (European timezone, languages: german, english, 
 french)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1915:


Fix Version/s: 2.0.0

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, 
 eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, 
 patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the debugger with a Gouraud shading, 
 try this file:
 http://asymptote.sourceforge.net/gallery/Gouraud.pdf
 Testing:
 I have attached several example PDFs. To see which one has which shading, 
 open them with an editor like NOTEPAD++, and search for /ShadingType 
 (without the quotes). If your images are rendering like the example PDFs, 
 then you were successful.
 Optional:
 Review and optimize the complete shading package for speed; implement cubic 
 spline interpolation for type 0 (sampled) functions (that one is really 
 low-low priority, see details by looking up cubic spline interpolation in 
 the PDF spec, which tells that it is disregarded in printing, and I don't 
 have a test PDF).
 Mentor: Tilman Hausherr (European timezone, languages: german, english, 
 french)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1915:


Affects Version/s: 1.8.6
   1.8.5

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, 
 eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, 
 patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the debugger with a Gouraud shading, 
 try this file:
 http://asymptote.sourceforge.net/gallery/Gouraud.pdf
 Testing:
 I have attached several example PDFs. To see which one has which shading, 
 open them with an editor like NOTEPAD++, and search for /ShadingType 
 (without the quotes). If your images are rendering like the example PDFs, 
 then you were successful.
 Optional:
 Review and optimize the complete shading package for speed; implement cubic 
 spline interpolation for type 0 (sampled) functions (that one is really 
 low-low priority, see details by looking up cubic spline interpolation in 
 the PDF spec, which tells that it is disregarded in printing, and I don't 
 have a test PDF).
 Mentor: Tilman Hausherr (European timezone, languages: german, english, 
 french)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes

2014-05-30 Thread Shaola Ren (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013432#comment-14013432
 ] 

Shaola Ren commented on PDFBOX-1915:


Thank Andreas Lehmkühler :)

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, 
 eci_altona-test-suite-v2_technical_H.pdf, lamp_cairo.pdf, patchCases.jpg, 
 patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the debugger with a Gouraud shading, 
 try this file:
 http://asymptote.sourceforge.net/gallery/Gouraud.pdf
 Testing:
 I have attached several example PDFs. To see which one has which shading, 
 open them with an editor like NOTEPAD++, and search for /ShadingType 
 (without the quotes). If your images are rendering like the example PDFs, 
 then you were successful.
 Optional:
 Review and optimize the complete shading package for speed; implement cubic 
 spline interpolation for type 0 (sampled) functions (that one is really 
 low-low priority, see details by looking up cubic spline interpolation in 
 the PDF spec, which tells that it is disregarded in printing, and I don't 
 have a test PDF).
 Mentor: Tilman Hausherr (European timezone, languages: german, english, 
 french)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-2103) JPXFilter fails to decode some Jpeg2000 images


 [ 
https://issues.apache.org/jira/browse/PDFBOX-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2103:
---

Attachment: JPXFilter.java.patch
01_MTEXT_CS6.pdf

 JPXFilter fails to decode some Jpeg2000 images
 --

 Key: PDFBOX-2103
 URL: https://issues.apache.org/jira/browse/PDFBOX-2103
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: 01_MTEXT_CS6.pdf, JPXFilter.java.patch


 Most of the images in the attached PDF are missing when rendered via PDFBox 
 (tested in 2.0 head). The reason is a NullPointerException in ImageIO:
 java.lang.NullPointerException
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.replace(J2KMetadata.java:962)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.addNode(J2KMetadata.java:631)
   at 
 jj2000.j2k.fileformat.reader.FileFormatReader.readFileFormat(FileFormatReader.java:279)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.initializeRead(J2KReadState.java:418)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.init(J2KReadState.java:189)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReader.read(J2KImageReader.java:443)
   at javax.imageio.ImageReader.read(Unknown Source)
   at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:84)
   at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:58)
 ...
 To avoid the problem, the ImageIO has to be instructed to skip reading 
 metadata of the image, i.e. use reader.setInput(iis, true, true) instead of 
 reader.setInput(iis) as shown in the attached patch. This is also what 
 ImageIO.read(stream) does - the method that was used before the commit 
 1570806.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (PDFBOX-2103) JPXFilter fails to decode some Jpeg2000 images

Petr Slaby created PDFBOX-2103:
--

 Summary: JPXFilter fails to decode some Jpeg2000 images
 Key: PDFBOX-2103
 URL: https://issues.apache.org/jira/browse/PDFBOX-2103
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: 01_MTEXT_CS6.pdf, JPXFilter.java.patch

Most of the images in the attached PDF are missing when rendered via PDFBox 
(tested in 2.0 head). The reason is a NullPointerException in ImageIO:
java.lang.NullPointerException
at 
com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.replace(J2KMetadata.java:962)
at 
com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.addNode(J2KMetadata.java:631)
at 
jj2000.j2k.fileformat.reader.FileFormatReader.readFileFormat(FileFormatReader.java:279)
at 
com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.initializeRead(J2KReadState.java:418)
at 
com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.init(J2KReadState.java:189)
at 
com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReader.read(J2KImageReader.java:443)
at javax.imageio.ImageReader.read(Unknown Source)
at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:84)
at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:58)
...

To avoid the problem, the ImageIO has to be instructed to skip reading metadata 
of the image, i.e. use reader.setInput(iis, true, true) instead of 
reader.setInput(iis) as shown in the attached patch. This is also what 
ImageIO.read(stream) does - the method that was used before the commit 1570806.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (PDFBOX-2104) Implement transparency groups

Petr Slaby created PDFBOX-2104:
--

 Summary: Implement transparency groups
 Key: PDFBOX-2104
 URL: https://issues.apache.org/jira/browse/PDFBOX-2104
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby


The attached PDF uses transparency groups, blending and soft masks to create 
the rounded corners and shades behind images. It appears that these features 
are not implemented in PDFBox. An implementation proposal is attached in the 
TransparencyGroup.patch. The basic idea is to create a buffered image, draw the 
transparency group content onto it and then use the result to produce the soft 
mask or draw the image on the original g2d.

Note: I am not the (only) author of the proposed change. It was developed in 
our company few years ago in sources based on a 1.7.x version of PDFBox, mostly 
by a guy who already left. Over the years, merging of the work done in PDFBox 
main stream into our source base has become impossible due to many refactorings 
and other deep going changes done. Now we would like to go the opposite way - 
where possible - bring the changes and fixes we have done into PDFBox main 
stream and start to use it in our installations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-2104) Implement transparency groups

[
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Petr Slaby updated PDFBOX-2104:
---

Attachment: 01_MTEXT_CS6.pdf
TransparencyGroups.patch

Implement transparency groups
-

Key: PDFBOX-2104
URL: https://issues.apache.org/jira/browse/PDFBOX-2104
Project: PDFBox
Issue Type: Improvement
Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.patch

The attached PDF uses transparency groups, blending and soft masks to create
the rounded corners and shades behind images. It appears that these features
are not implemented in PDFBox. An implementation proposal is attached in the
TransparencyGroup.patch. The basic idea is to create a buffered image, draw
the transparency group content onto it and then use the result to produce the
soft mask or draw the image on the original g2d.
Note: I am not the (only) author of the proposed change. It was developed in
our company few years ago in sources based on a 1.7.x version of PDFBox,
mostly by a guy who already left. Over the years, merging of the work done in
PDFBox main stream into our source base has become impossible due to many
refactorings and other deep going changes done. Now we would like to go the
opposite way - where possible - bring the changes and fixes we have done into
PDFBox main stream and start to use it in our installations.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2104) Implement transparency groups

[
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013495#comment-14013495
]

Andreas Lehmkühler edited comment on PDFBOX-2104 at 5/30/14 10:41 AM:
--

Great, that's a most welcome change of course! So, first of all thanks for that!

But there is one issue we have to solve first. Your patch is a substantial
change for our codebase and requires

- that all authors sign an iCLA and the company a CCLA
- or that the company which donates the code signs a software grant

IMHO in your case it's the latter. [Link|http://www.apache.org/licenses/]
provides the details. If there are any questions, please adress those to the
mailing list dev@pdfbox

was (Author: lehmi):
Great, that's a most welcome change of course! So, first of all thanks for that!

But there is one issue we have to solve first. Your patch is a substantial
change for our codebase and requires

- that all authors sign an iCLA and the company a CCLA
- or that the company which donates the code signs a software grant

IMHO in your case it's the latter. [1] provides the details. If there are any
questions, please adress those to the mailing list dev@pdfbox

Implement transparency groups
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2104) Implement transparency groups

[
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013597#comment-14013597
]

Petr Slaby commented on PDFBOX-2104:

I do not think a software grant would be applicable, I just want to contribute
a few patches and improvements to pdfbox. I have forwarded the request to sign
the CCLA to our decision makers and legal owners of my works.

Implement transparency groups
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2104) Implement transparency groups

[
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013628#comment-14013628
]

Andreas Lehmkühler commented on PDFBOX-2104:

Maybe that's the better choice, especially if we are talking about more
contributions in the future. But you as an individual have to sign a iCLA as
well the iCLA.

Implement transparency groups
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2101) Surprising memory consumption when extracting images


[ 
https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013797#comment-14013797
 ] 

Andreas Lehmkühler commented on PDFBOX-2101:


I've added a clear() method to PDFont and PDXObject to delete cached resources 
if necessary in revisions 1598627 (trunk) and 1598633 (1.8 branch). Those 
methods are called when clearing PDResources.
PDFont.clear is still empty but I'm going to fill in some stuff soon.

 Surprising memory consumption when extracting images
 

 Key: PDFBOX-2101
 URL: https://issues.apache.org/jira/browse/PDFBOX-2101
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.5
 Environment: Windows 7
 java version 1.7.0_55
 Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, 
 PDFBOX-2101-714-poor.jpg, java.hprof.zip


 ExtractImages seems to fail to release memory resources on some files in both 
 PDFBox 1.8.5 and trunk.  
 On this file 4MB file 
 [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if 
 extracting every image on every page (and there are many, many duplicate 
 images), there is an OOM with -Xmx1g.  If there is no Xmx and there is  2.5g 
 available, ExtractImages will work.
 With some experimentation, the triggers seem to be JPEG images that have 
 masks.  I'm not sure, though, whether the issue is with PDFBox or Java.
 Commandlines:
 1.8.5:
 java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 
 239665.pdf
 2.0_SNAPSHOT:
 java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar 
 org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf
 Results:
 1.8.5: 906 files before OOM
 {noformat}
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at 
 java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
 va:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at 
 org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:
 514)
 at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDP
 ixelMap.java:217)
 at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStr
 eam(PDPixelMap.java:363)
 at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(
 PDXObjectImage.java:254)
 at 
 org.apache.pdfbox.ExtractImages.processResources(ExtractImages.java:2
 02)
 at 
 org.apache.pdfbox.ExtractImages.extractImages(ExtractImages.java:160)
 at org.apache.pdfbox.ExtractImages.main(ExtractImages.java:65)
 {noformat}
 2.0_SNAPSHOT: 428 files before OOM
 {noformat}
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at 
 java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
 va:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:70)
 at org.apache.pdfbox.io.IOUtils.toByteArray(IOUtils.java:52)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(
 SampledImageReader.java:171)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBIma
 ge(SampledImageReader.java:154)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDIm
 ageXObject.java:171)
 at 
 org.apache.pdfbox.tools.ExtractImages.write2file(ExtractImages.java:2
 31)
 at 
 org.apache.pdfbox.tools.ExtractImages.processResources(ExtractImages.
 java:206)
 at 
 org.apache.pdfbox.tools.ExtractImages.extractImages(ExtractImages.jav
 a:164)
 at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:69)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (PDFBOX-2103) JPXFilter fails to decode some Jpeg2000 images


 [ 
https://issues.apache.org/jira/browse/PDFBOX-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-2103.


   Resolution: Fixed
Fix Version/s: 2.0.0
 Assignee: Andreas Lehmkühler

I've added the patch as proposed in revision 1598642.

Thanks for the contribution!

 JPXFilter fails to decode some Jpeg2000 images
 --

 Key: PDFBOX-2103
 URL: https://issues.apache.org/jira/browse/PDFBOX-2103
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: 01_MTEXT_CS6.pdf, JPXFilter.java.patch


 Most of the images in the attached PDF are missing when rendered via PDFBox 
 (tested in 2.0 head). The reason is a NullPointerException in ImageIO:
 java.lang.NullPointerException
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.replace(J2KMetadata.java:962)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.addNode(J2KMetadata.java:631)
   at 
 jj2000.j2k.fileformat.reader.FileFormatReader.readFileFormat(FileFormatReader.java:279)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.initializeRead(J2KReadState.java:418)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.init(J2KReadState.java:189)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReader.read(J2KImageReader.java:443)
   at javax.imageio.ImageReader.read(Unknown Source)
   at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:84)
   at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:58)
 ...
 To avoid the problem, the ImageIO has to be instructed to skip reading 
 metadata of the image, i.e. use reader.setInput(iis, true, true) instead of 
 reader.setInput(iis) as shown in the attached patch. This is also what 
 ImageIO.read(stream) does - the method that was used before the commit 
 1570806.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2101) Surprising memory consumption when extracting images


[ 
https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013904#comment-14013904
 ] 

Andreas Lehmkühler commented on PDFBOX-2101:


I've implemented clear() for some of the classes inherited from PDFont in 
revisions 1598655 (trunk) and 1598657 (1.8 branch). 
This should lead to a smaller memory foot print as some objects could be 
released earlier

 Surprising memory consumption when extracting images
 

 Key: PDFBOX-2101
 URL: https://issues.apache.org/jira/browse/PDFBOX-2101
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.5
 Environment: Windows 7
 java version 1.7.0_55
 Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, 
 PDFBOX-2101-714-poor.jpg, java.hprof.zip


 ExtractImages seems to fail to release memory resources on some files in both 
 PDFBox 1.8.5 and trunk.  
 On this file 4MB file 
 [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if 
 extracting every image on every page (and there are many, many duplicate 
 images), there is an OOM with -Xmx1g.  If there is no Xmx and there is  2.5g 
 available, ExtractImages will work.
 With some experimentation, the triggers seem to be JPEG images that have 
 masks.  I'm not sure, though, whether the issue is with PDFBox or Java.
 Commandlines:
 1.8.5:
 java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 
 239665.pdf
 2.0_SNAPSHOT:
 java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar 
 org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf
 Results:
 1.8.5: 906 files before OOM
 {noformat}
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at 
 java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
 va:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at 
 org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:
 514)
 at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDP
 ixelMap.java:217)
 at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStr
 eam(PDPixelMap.java:363)
 at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(
 PDXObjectImage.java:254)
 at 
 org.apache.pdfbox.ExtractImages.processResources(ExtractImages.java:2
 02)
 at 
 org.apache.pdfbox.ExtractImages.extractImages(ExtractImages.java:160)
 at org.apache.pdfbox.ExtractImages.main(ExtractImages.java:65)
 {noformat}
 2.0_SNAPSHOT: 428 files before OOM
 {noformat}
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at 
 java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
 va:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:70)
 at org.apache.pdfbox.io.IOUtils.toByteArray(IOUtils.java:52)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(
 SampledImageReader.java:171)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBIma
 ge(SampledImageReader.java:154)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDIm
 ageXObject.java:171)
 at 
 org.apache.pdfbox.tools.ExtractImages.write2file(ExtractImages.java:2
 31)
 at 
 org.apache.pdfbox.tools.ExtractImages.processResources(ExtractImages.
 java:206)
 at 
 org.apache.pdfbox.tools.ExtractImages.extractImages(ExtractImages.jav
 a:164)
 at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:69)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2103) JPXFilter fails to decode some Jpeg2000 images


[ 
https://issues.apache.org/jira/browse/PDFBOX-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013907#comment-14013907
 ] 

Tilman Hausherr commented on PDFBOX-2103:
-

Just for the record, the described NPE didn't happen for me. Maybe it depends 
on what JAI version is used. Anyway, it means we have yet another interesting 
test PDF :-)

 JPXFilter fails to decode some Jpeg2000 images
 --

 Key: PDFBOX-2103
 URL: https://issues.apache.org/jira/browse/PDFBOX-2103
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: 01_MTEXT_CS6.pdf, JPXFilter.java.patch


 Most of the images in the attached PDF are missing when rendered via PDFBox 
 (tested in 2.0 head). The reason is a NullPointerException in ImageIO:
 java.lang.NullPointerException
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.replace(J2KMetadata.java:962)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KMetadata.addNode(J2KMetadata.java:631)
   at 
 jj2000.j2k.fileformat.reader.FileFormatReader.readFileFormat(FileFormatReader.java:279)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.initializeRead(J2KReadState.java:418)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KReadState.init(J2KReadState.java:189)
   at 
 com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageReader.read(J2KImageReader.java:443)
   at javax.imageio.ImageReader.read(Unknown Source)
   at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:84)
   at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:58)
 ...
 To avoid the problem, the ImageIO has to be instructed to skip reading 
 metadata of the image, i.e. use reader.setInput(iis, true, true) instead of 
 reader.setInput(iis) as shown in the attached patch. This is also what 
 ImageIO.read(stream) does - the method that was used before the commit 
 1570806.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2104) Implement transparency groups

2014-05-30 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013961#comment-14013961
 ] 

John Hewson commented on PDFBOX-2104:
-

I'm not sure about your ColorSpaceDeviceGray class, we used to use subclasses 
of AWT color spaces like this but removed them due to poor performance. There 
shouldn't be any need for color conversion in 2.0 as everything is RGB 
internally, perhaps you can remove this along with the CIE-XYZ handling?

 Implement transparency groups
 -

 Key: PDFBOX-2104
 URL: https://issues.apache.org/jira/browse/PDFBOX-2104
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.patch


 The attached PDF uses transparency groups, blending and soft masks to create 
 the rounded corners and shades behind images. It appears that these features 
 are not implemented in PDFBox. An implementation proposal is attached in the 
 TransparencyGroup.patch. The basic idea is to create a buffered image, draw 
 the transparency group content onto it and then use the result to produce the 
 soft mask or draw the image on the original g2d.
 Note: I am not the (only) author of the proposed change. It was developed in 
 our company few years ago in sources based on a 1.7.x version of PDFBox, 
 mostly by a guy who already left. Over the years, merging of the work done in 
 PDFBox main stream into our source base has become impossible due to many 
 refactorings and other deep going changes done. Now we would like to go the 
 opposite way - where possible - bring the changes and fixes we have done into 
 PDFBox main stream and start to use it in our installations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PDFBOX-2104) Implement transparency groups

2014-05-30 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013961#comment-14013961
 ] 

John Hewson edited comment on PDFBOX-2104 at 5/30/14 5:15 PM:
--

I'm not sure about your ColorSpaceDeviceGray class, we used to use subclasses 
of AWT color spaces like this but removed them due to poor performance. There 
shouldn't be any need for color conversion in 2.0 as everything is RGB 
internally (which wasn't the case with 1.7), perhaps you can remove this along 
with the CIE-XYZ handling?


was (Author: jahewson):
I'm not sure about your ColorSpaceDeviceGray class, we used to use subclasses 
of AWT color spaces like this but removed them due to poor performance. There 
shouldn't be any need for color conversion in 2.0 as everything is RGB 
internally, perhaps you can remove this along with the CIE-XYZ handling?

 Implement transparency groups
 -

 Key: PDFBOX-2104
 URL: https://issues.apache.org/jira/browse/PDFBOX-2104
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.patch


 The attached PDF uses transparency groups, blending and soft masks to create 
 the rounded corners and shades behind images. It appears that these features 
 are not implemented in PDFBox. An implementation proposal is attached in the 
 TransparencyGroup.patch. The basic idea is to create a buffered image, draw 
 the transparency group content onto it and then use the result to produce the 
 soft mask or draw the image on the original g2d.
 Note: I am not the (only) author of the proposed change. It was developed in 
 our company few years ago in sources based on a 1.7.x version of PDFBox, 
 mostly by a guy who already left. Over the years, merging of the work done in 
 PDFBox main stream into our source base has become impossible due to many 
 refactorings and other deep going changes done. Now we would like to go the 
 opposite way - where possible - bring the changes and fixes we have done into 
 PDFBox main stream and start to use it in our installations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Enhancements to PDFBox

2014-05-30 Thread John Hewson

 It will involve a lot of COS processing. I haven’t decided yet if it will sit 
 on top of COS or PD. Typically we do encourage people to use PD so I tend to 
 start from there and dig down internally as needed. WDYT?

Starting with PD and using COS where needed sounds reasonable. Ultimately you 
don’t need a high-level API to do the manipulations which you’re interested in, 
so COS should suffice, but PD might be quicker to get started with.

-- John

On 29 May 2014, at 23:25, Maruan Sahyoun sahy...@fileaffairs.de wrote:

 
 Am 29.05.2014 um 18:51 schrieb John Hewson j...@jahewson.com:
 
 # splitting files (e.g. remove no longer needed resources)
 
 Each page has its own Resources dictionary, so it shouldn't be too 
 difficult. One thing to watch out for is is the page tree which allows 
 pages to inherit resources from each other, this is handled as PDPageNode 
 but it's kind of messy.
 
 thanks for the hint. Splitting and merging is somewhat similar as splitting 
 is typically done by creating a new document and importing the needed pages 
 into the newly created document. Using the current code this might lead to 
 duplicate resources. 
 
 
 # merging files (e.g. avoid duplicating resources)
 
 Sounds like the files are pretty similar, is this actually an overlay? Or 
 are you wanting to insert entire pages?
 
 it’s merging individual files together inserting entire pages. Although the 
 files are created individually they share some common elements like company 
 logos or fonts. 
 
 
 I imagine you probably want to implement both these features at the COS 
 level rather than the PD level, as it's pretty low-level processing.
 
 
 It will involve a lot of COS processing. I haven’t decided yet if it will sit 
 on top of COS or PD. Typically we do encourage people to use PD so I tend to 
 start from there and dig down internally as needed. WDYT?
 
 
 -- John
 
 On 29 May 2014, at 00:39, Maruan Sahyoun sahy...@fileaffairs.de wrote:
 
 Hi,
 
 for a current project I need to work on enhancing PDFBox for
 
 # splitting files (e.g. remove no longer needed resources)
 # merging files (e.g. avoid duplicating resources)
 # page handling (adding/removing individual pages with resource handling)
 # enhancements to forms handling (pre fill XFA forms - partially done, 
 enhancing AP generation)
 
 Is someone else working on something similar?
 
 BR
 
 Maruan

[jira] [Updated] (PDFBOX-2102) Characters swallowed on COSString.getString()


 [ 
https://issues.apache.org/jira/browse/PDFBOX-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2102:


Fix Version/s: 2.0.0
   1.8.6

 Characters swallowed on COSString.getString()
 -

 Key: PDFBOX-2102
 URL: https://issues.apache.org/jira/browse/PDFBOX-2102
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Jeremias Maerki
Assignee: Jeremias Maerki
 Fix For: 1.8.6, 2.0.0


 PDFBOX-1437 seems to have introduced a regression that causes characters like 
 \n to be swallowed when COSString.getString() is called. PDFDocEncoding 
 doesn't handle all valid characters.
 {code}
 testStr = Line1\nLine2\nLine3\n;
 COSString lineFeedString = new COSString(testStr);
 assertEquals(testStr, lineFeedString.getString());
 //Same as previous but this time as a dictionary value
 lineFeedString = new COSString(true);
 for (int i = 0; i  testStr.length(); i++) {
 lineFeedString.append(testStr.charAt(i));
 }
 assertEquals(testStr, lineFeedString.getString()); //currently fails
 {code}
 Direct link to the change causing the regression:
 http://svn.apache.org/viewvc?view=revisionrevision=1406628



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2101) Surprising memory consumption when extracting images


[ 
https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014083#comment-14014083
 ] 

Tilman Hausherr commented on PDFBOX-2101:
-

Sorry, but there's a rendering problem with the 2nd page of PDFBOX-2103:
{code}
Start rendering page 2
30.05.2014 20:39:20.854 WARN  [main] org.apache.pdfbox.util.PDFStreamEngine:557 
- java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pdfbox.cos.COSArray.getObject(COSArray.java:188)
at 
org.apache.pdfbox.pdmodel.font.PDType0Font.init(PDType0Font.java:63)
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:72)
at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:209)
at 
org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:615)
at 
org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:53)
at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:544)
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:264)
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:223)
at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:164)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:214)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:147)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:96)
at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:414)
at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:208)
30.05.2014 20:39:20.866 WARN  [main] org.apache.pdfbox.util.PDFStreamEngine:356 
- java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:352)
at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:43)
at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:544)
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:264)
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:223)
at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:164)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:214)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:147)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:96)
at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:414)
at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:208)
{code}



 Surprising memory consumption when extracting images
 

 Key: PDFBOX-2101
 URL: https://issues.apache.org/jira/browse/PDFBOX-2101
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.5
 Environment: Windows 7
 java version 1.7.0_55
 Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, 
 PDFBOX-2101-714-poor.jpg, java.hprof.zip


 ExtractImages seems to fail to release memory resources on some files in both 
 PDFBox 1.8.5 and trunk.  
 On this file 4MB file 
 [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if 
 extracting every image on every page (and there are many, many duplicate 
 images), there is an OOM with -Xmx1g.  If there is no Xmx and there is  2.5g 
 available, ExtractImages will work.
 With some experimentation, the triggers seem to be JPEG images that have 
 masks.  I'm not sure, though, whether the issue is with PDFBox or Java.
 Commandlines:
 1.8.5:
 java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 
 239665.pdf
 2.0_SNAPSHOT:
 java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar 
 org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf
 Results:
 1.8.5: 906 files before OOM
 {noformat}
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at

[jira] [Commented] (PDFBOX-2101) Surprising memory consumption when extracting images


[ 
https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014101#comment-14014101
 ] 

Tilman Hausherr commented on PDFBOX-2101:
-

The file of PDFBOX-1283 has also a rendering problem.

 Surprising memory consumption when extracting images
 

 Key: PDFBOX-2101
 URL: https://issues.apache.org/jira/browse/PDFBOX-2101
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.5
 Environment: Windows 7
 java version 1.7.0_55
 Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, 
 PDFBOX-2101-714-poor.jpg, java.hprof.zip


 ExtractImages seems to fail to release memory resources on some files in both 
 PDFBox 1.8.5 and trunk.  
 On this file 4MB file 
 [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if 
 extracting every image on every page (and there are many, many duplicate 
 images), there is an OOM with -Xmx1g.  If there is no Xmx and there is  2.5g 
 available, ExtractImages will work.
 With some experimentation, the triggers seem to be JPEG images that have 
 masks.  I'm not sure, though, whether the issue is with PDFBox or Java.
 Commandlines:
 1.8.5:
 java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 
 239665.pdf
 2.0_SNAPSHOT:
 java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar 
 org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf
 Results:
 1.8.5: 906 files before OOM
 {noformat}
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at 
 java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
 va:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at 
 org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:
 514)
 at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDP
 ixelMap.java:217)
 at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStr
 eam(PDPixelMap.java:363)
 at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(
 PDXObjectImage.java:254)
 at 
 org.apache.pdfbox.ExtractImages.processResources(ExtractImages.java:2
 02)
 at 
 org.apache.pdfbox.ExtractImages.extractImages(ExtractImages.java:160)
 at org.apache.pdfbox.ExtractImages.main(ExtractImages.java:65)
 {noformat}
 2.0_SNAPSHOT: 428 files before OOM
 {noformat}
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at 
 java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
 va:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:70)
 at org.apache.pdfbox.io.IOUtils.toByteArray(IOUtils.java:52)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(
 SampledImageReader.java:171)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBIma
 ge(SampledImageReader.java:154)
 at 
 org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDIm
 ageXObject.java:171)
 at 
 org.apache.pdfbox.tools.ExtractImages.write2file(ExtractImages.java:2
 31)
 at 
 org.apache.pdfbox.tools.ExtractImages.processResources(ExtractImages.
 java:206)
 at 
 org.apache.pdfbox.tools.ExtractImages.extractImages(ExtractImages.jav
 a:164)
 at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:69)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Idea: stable 2.0 versions

2014-05-30 Thread John Hewson

I think the risk of creating the impression that 2.0 is stable is too high. The 
real problem
is that 2.0 has been too long in development, there were frustrated users 
asking a year
ago about when it would be released.

Perhaps it’s time to push for a release of 2.0 and aim for a more frequent 
release cycle
after that, to avoid repeating the situation where the stable and trunk 
versions are
years apart?

What is holding back 2.0? What features are we *really* holding out on? Can we 
put
together a roadmap - our users often ask for one...

-- John

On 30 May 2014, at 14:01, Tilman Hausherr thaush...@t-online.de wrote:

 I suggest that we come up with a concept of designating stable versions (or 
 tested versions) for the trunk and put them on the homepage. A stable 
 version is one with no or only minor regressions, and/or a version that 
 committers have found to be good. This would be for users of the 2.0 
 version who don't want to read every discussion, and also as a hint for 
 unhappy 1.8 users.
 
 I suspect that other open source projects do also have rules to designate 
 stable versions, but I didn't look at them.
 
 Proposed rules:
 - any committer can designate any version that is older than 24 hours as 
 stable
 - any committer can veto any version as unstable
 - any version that has only positive votes is mentioned on
  https://pdfbox.apache.org/downloads.html#scm
 - there should be up to three versions there
 
 Tilman

[jira] [Commented] (PDFBOX-2102) Characters swallowed on COSString.getString()


[ 
https://issues.apache.org/jira/browse/PDFBOX-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014295#comment-14014295
 ] 

Petr Slaby commented on PDFBOX-2102:


[~jerem...@apache.org]: After the change in 1598316, I am getting 
IllegalArgumentExceptions on some of the documents in my test suite. The 
culprit seems to be a missing in.position(in.position() - 1); at the line 141 
in SingleByteCharset. You might also consider using something like
int mark = src.position();
try 
{
 mark++; // in front of out.put()
}
finally 
{
src.position(mark);
}

This pattern is used in single byte encoding implementation of OpenJVM. Also, 
it has a better performing implementation for the case that both the byte and 
char buffer are based on an array (which is the most usual case).

The test document (coming from http://www.stillhq.com/pdfdb/db.html) and stack 
trace is attached, but the missing call of position() seems to be obvious, 
anyway.



 Characters swallowed on COSString.getString()
 -

 Key: PDFBOX-2102
 URL: https://issues.apache.org/jira/browse/PDFBOX-2102
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Jeremias Maerki
Assignee: Jeremias Maerki
 Fix For: 1.8.6, 2.0.0


 PDFBOX-1437 seems to have introduced a regression that causes characters like 
 \n to be swallowed when COSString.getString() is called. PDFDocEncoding 
 doesn't handle all valid characters.
 {code}
 testStr = Line1\nLine2\nLine3\n;
 COSString lineFeedString = new COSString(testStr);
 assertEquals(testStr, lineFeedString.getString());
 //Same as previous but this time as a dictionary value
 lineFeedString = new COSString(true);
 for (int i = 0; i  testStr.length(); i++) {
 lineFeedString.append(testStr.charAt(i));
 }
 assertEquals(testStr, lineFeedString.getString()); //currently fails
 {code}
 Direct link to the change causing the regression:
 http://svn.apache.org/viewvc?view=revisionrevision=1406628



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-2102) Characters swallowed on COSString.getString()