[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes

2014-07-05 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052814#comment-14052814
 ] 

Tilman Hausherr commented on PDFBOX-1915:
-

Have a look at PDFBOX-2117, but please don't read the dialog yet, only look at 
the attached PDF files, not the java files, and not the dialog after them.

This issue was opened by power user Petr who has done a lot for the project 
in the last few weeks but who didn't know about GSoC2014. I would ask you to 
first use the profiler with the files, to look at the source code for 
optimization possibilities. Maybe you'll have the same ideas as in the java 
files / the dialog there, maybe you have different ones.

Note that optimization may not only have to be done in the shading package, it 
is also possibly in the function package. Most is function type 2.

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, 
 CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, 
 eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, 
 lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, 
 lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, 
 pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, 
 shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, 
 tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, 
 tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you 

[jira] [Comment Edited] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes

2014-07-05 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052814#comment-14052814
 ] 

Tilman Hausherr edited comment on PDFBOX-1915 at 7/5/14 7:00 AM:
-

Have a look at PDFBOX-2117, but please don't read the dialog yet, only look at 
the attached PDF files, not the java files, and not the dialog after them.

This issue was opened by power user Petr who has done a lot for the project 
in the last few weeks but who didn't know about GSoC2014. I would ask you to 
first use the profiler with the files, to look at the existing source code for 
optimization possibilities. Maybe you'll have the same ideas as in the java 
files / the dialog there, maybe you have different ones.

Note that optimization may not only have to be done in the shading package, it 
is also possibly in the function package. Most is function type 2.


was (Author: tilman):
Have a look at PDFBOX-2117, but please don't read the dialog yet, only look at 
the attached PDF files, not the java files, and not the dialog after them.

This issue was opened by power user Petr who has done a lot for the project 
in the last few weeks but who didn't know about GSoC2014. I would ask you to 
first use the profiler with the files, to look at the source code for 
optimization possibilities. Maybe you'll have the same ideas as in the java 
files / the dialog there, maybe you have different ones.

Note that optimization may not only have to be done in the shading package, it 
is also possibly in the function package. Most is function type 2.

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, 
 CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, 
 eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, 
 lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, 
 lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, 
 pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, 
 shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, 
 tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, 
 tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read 

[jira] [Updated] (PDFBOX-2117) AxialShadingContext is slow

2014-07-05 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2117:


Attachment: Shading2Function2.ps
Shading2Function2.pdf

 AxialShadingContext is slow
 ---

 Key: PDFBOX-2117
 URL: https://issues.apache.org/jira/browse/PDFBOX-2117
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Petr Slaby
 Attachments: 01_MTEXT_CS6.pdf, AxialShading.patch, 
 AxialShading1.patch, AxialShadingContext.java.getrgbimage, 
 Shading2Function2.pdf, Shading2Function2.ps, Shading2Function2text.pdf, 
 asy-shade.pdf, color_gradient.pdf, shading_pattern.pdf


 AxialShadingContext#getRaster() is on top of profiler hot spots in documents 
 that use an axial shading. Inside it, the slowest part is calling 
 PDColorSpaceRGB#toRGB() and PDFunctionType3#eval() (in this order).
   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2117) AxialShadingContext is slow

2014-07-05 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2117:


Issue Type: Sub-task  (was: Improvement)
Parent: PDFBOX-1915

 AxialShadingContext is slow
 ---

 Key: PDFBOX-2117
 URL: https://issues.apache.org/jira/browse/PDFBOX-2117
 Project: PDFBox
  Issue Type: Sub-task
  Components: Rendering
Reporter: Petr Slaby
 Attachments: 01_MTEXT_CS6.pdf, AxialShading.patch, 
 AxialShading1.patch, AxialShadingContext.java.getrgbimage, 
 Shading2Function2.pdf, Shading2Function2.ps, Shading2Function2text.pdf, 
 asy-shade.pdf, color_gradient.pdf, shading_pattern.pdf


 AxialShadingContext#getRaster() is on top of profiler hot spots in documents 
 that use an axial shading. Inside it, the slowest part is calling 
 PDColorSpaceRGB#toRGB() and PDFunctionType3#eval() (in this order).
   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes

2014-07-05 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052814#comment-14052814
 ] 

Tilman Hausherr edited comment on PDFBOX-1915 at 7/5/14 8:15 AM:
-

Have a look at PDFBOX-2117, but please don't read the dialog yet, only look at 
the attached PDF files, not the java files, and not the dialog after them.

This issue was opened by power user Petr who has done a lot for the project 
in the last few weeks but who didn't know about GSoC2014. I would ask you to 
first use the profiler with the files, to look at the existing source code for 
optimization possibilities. Maybe you'll have the same ideas as in the java 
files / the dialog there, maybe you have different ones.

Note that optimization may not only have to be done in the shading package, it 
is also possibly in the function package. Most is function type 2. 

I've added an example with its postscript file in that issue, this makes it 
easier to understand what this function type 2 is about, even if you don't know 
postscript.

The C0 and C1 values are boundaries, the three values are color values (in that 
case, R G B, but it could also be C M Y K or whatever the colorspace is) and N 
is an exponent.

It is explained in the PDF spec but also here in short:
http://ipdfdev.com/2012/09/03/ifxpdffactory-part-3-pdf-functions/


was (Author: tilman):
Have a look at PDFBOX-2117, but please don't read the dialog yet, only look at 
the attached PDF files, not the java files, and not the dialog after them.

This issue was opened by power user Petr who has done a lot for the project 
in the last few weeks but who didn't know about GSoC2014. I would ask you to 
first use the profiler with the files, to look at the existing source code for 
optimization possibilities. Maybe you'll have the same ideas as in the java 
files / the dialog there, maybe you have different ones.

Note that optimization may not only have to be done in the shading package, it 
is also possibly in the function package. Most is function type 2.

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, 
 CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, 
 eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, 
 lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, 
 lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, 
 pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, 
 shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, 
 tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, 
 tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String 

Re: Regression Testing

2014-07-05 Thread Maruan Sahyoun

 Hi Tilman
 
 Thanks for your thoughts, I think that your concerns are already covered by 
 my original proposal, I’ll try to explain why and how:
 
 Of course I agree with the need for regression tests, however it isn't easy: 
 besides the problems of the different JDKs (I use JDK7 Windows 64 bit), 
 there is the problem that some enhancements create slight changes in 
 rendering that are not errors, i.e. both the before and the after files 
 look OK by itself. This has happened when we changed the text rendering 
 recently, and has happened again when the clipping was improved. The cause 
 are probably slight changes in color or in boundaries.
 
 If a rendering has changed then the regression test should fail. When a 
 failure occurs the developer needs to manually inspect the differences (we 
 could generate a visual diff which highlights what changed to make this 
 easier) and if ok then they can replace the known-good PNG with the ones just 
 rendered. Indeed this will be the basic workflow for working with regression 
 tests.
 

I think this is the only way to handle that situation. The same applies for 
text extraction etc. - If an improvement changes the results the ‚base‘ needs 
to be reset by adding the new image, text etc as the validation source.

A basic testbed could also run against other JDKs - e.g. wo validating against 
the know-good files - so we pick up potential issues early. Should be easy with 
Jenkins and treated as a hint.  


 Copyrights is a problem: I'm testing mostly with JIRA attachments that I've 
 downloaded over the years. While uploading such files to JIRA might count as 
 fair use, I doubt that this would still be true if they are included in a 
 distribution. Instead, they should be stored somewhere on Apache servers 
 where only committers and build software (Travis, Jenkins, ...) can 
 access then. The public PDFs that Maruan mentions don't possibly have all 
 the Problem cases that we solved before. However I have started working with 
 these files and there are at least 5 recent issues that deals with them.
 
 The PDFs won’t be in a distribution. They will just happen to be stored in an 
 SVN repo but not our source code repo, in the same way that the website is 
 stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law doesn’t 
 distinguish between JIRA and SVN, both are publicly available via HTTP, so 
 using SVN will simply be a continuation of what we’re already doing with JIRA.
 
 The crucial factor is that we’re only storing publicly available PDFs,  
 because we have the right to do so, just like Google’s cache, and like we 
 currently do with JIRA.
 
 Additionally, the PDFs need to be version controlled otherwise we won’t be 
 able to reliably recreate previous builds, so storing the files on a web 
 server won’t be practical. Also committers will frequently be updating the 
 renderings as bugs are fixed and we’ll need to version-control the rendered 
 PNG files for the same reason. Finally, having committers-only files doesn’t 
 fit well with the Apache goal of open development and would be unnecessary 
 anyway given that all the PDFs are to be taken from public sources only.
 
 In summary, I’m proposing that we just keep doing what we’re currently doing 
 with JIRA but we move it into its own SVN repo along with some pre-rendered 
 PNGs.

In addition if we put in workarounds to handle nonconforming PDFs there should 
be a unit test added to make sure that we don’t break that e.g. when rewriting 
the parser. 

 
 Re preflight: the default mode should be to have the Isartor tests on. 
 Individuals could still disable them locally, but the central build software 
 should always use them.
 
 Yes - does anybody know why this isn’t the default?
 

No.

+1 for enabling it per default


 -- John



PDFBox and documentation

2014-07-05 Thread Maruan Sahyoun
Hi,

I have the infrastructure for enhancing our documentation nearly sorted (needed 
to learn a little more about the possibilities of the Apache CMS). Now WDYT 
would be the expectation for documenting how to use PDFBox for different use 
cases - code snippets or runnable examples?

BR
Maruan

[jira] [Reopened] (PDFBOX-1695) Improve pdfbox tests

2014-07-05 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reopened PDFBOX-1695:
-

  Assignee: Tilman Hausherr

Reopen to commit my changes in TestPDFToImage. I have use this and improved it 
over the months and used to show regressions.

 Improve pdfbox tests
 

 Key: PDFBOX-1695
 URL: https://issues.apache.org/jira/browse/PDFBOX-1695
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 1.8.2, 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
Priority: Minor
  Labels: tdd, test-driven, testing
 Attachments: ccitt4.tif, jbig2test-01.png, jbig2test.pdf


 I'd like to improve the tests for rendering.
 org/apache/pdfbox/util/TestPDFToImage.java is disabled in pdfbox\pom.xml . 
 This has been disabled since 2009 ?! So I enabled it here.
 The subdir rendering is missing in pdfbox\target\test-output for these tests
 When a test fails because the rendered image is not identical, no detailed 
 message appears on the console. It appears only in pdfbox.log and not on the 
 console.
 this is because of the settings in
 pdfbox\src\test\resources\logging.properties
 If this is on purpose, please change the texts in 
 pdfbox\src\test\java\org\apache\pdfbox\util\*.java from
 One or more failures, see test log for details
 to
 One or more failures, see test logfile 'pdfbox.log' for details
 I wanted to attach a PDF with ccitt g4 compression and its rendering created 
 with the 1.8.2 version, but it doesn't work out, seems that CIB generates 
 files that can be rendered properly with 1.8.2. However I attach the TIFF g4 
 file, and a JBIG2 test file from it. I don't have access to a Xerox 
 WorkCentre (enter jbig2 in google news :-) ) so I used a free service, so 
 there's a watermark.
 It should be included into
 pdfbox\src\test\resources\input\rendering
 I have created the image myself and I give it into the public domain.
 If my suggestion is accepted, it would be nice if people could create files 
 that fail in current versions or have failed in old versions, and release 
 these files to the public domain, so that they can be added to the tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Build failed in Jenkins: PDFBox-ant #1411

2014-07-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/PDFBox-ant/1411/changes

Changes:

[tilman] PDFBOX-1695: create graphic diff file, and empty files in target dir 
if anything goes wrong

--
[...truncated 1481 lines...]
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/descriptor/TrueTypeDescriptorHelper.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/descriptor/Type1DescriptorHelper.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/descriptor/CIDType2DescriptorHelper.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/Type1FontValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/CIDType2FontValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/Type3FontValidator.java
A preflight/src/main/java/org/apache/pdfbox/preflight/font/container
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/container/CIDType0Container.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/container/TrueTypeContainer.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/container/Type0Container.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/container/Type1Container.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/container/CIDType2Container.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/container/Type3Container.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/container/FontContainer.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/DescendantFontValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/font/SimpleFontValidator.java
AUpreflight/src/main/java/org/apache/pdfbox/preflight/Format.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/PreflightContext.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/ValidationResult.java
A preflight/src/main/java/org/apache/pdfbox/preflight/annotation
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/LinkAnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/TrapNetAnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/AnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/MarkupAnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/InkAnnotationValdiator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/SquareCircleAnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/PopupAnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/WidgetAnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/TextAnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/FreeTextAnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/RubberStampAnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/AnnotationValidatorFactory.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/LineAnnotationValidator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/PrintMarkAnnotationValidator.java
A preflight/src/main/java/org/apache/pdfbox/preflight/annotation/pdfa
AU
preflight/src/main/java/org/apache/pdfbox/preflight/annotation/pdfa/PDFAbAnnotationFactory.java
A preflight/src/main/java/org/apache/pdfbox/preflight/content
AU
preflight/src/main/java/org/apache/pdfbox/preflight/content/ContentStreamException.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/content/PreflightStreamEngine.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/content/StubOperator.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/content/PreflightContentStream.java
A preflight/src/main/java/org/apache/pdfbox/preflight/action
AU
preflight/src/main/java/org/apache/pdfbox/preflight/action/AbstractActionManager.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/action/UriAction.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/action/UndefAction.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/action/ActionManagerFactory.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/action/SubmitAction.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/action/GoToRemoteAction.java
AU
preflight/src/main/java/org/apache/pdfbox/preflight/action/NamedAction.java
AU

Re: PDFBox and documentation

2014-07-05 Thread Maruan Sahyoun
that should be doable with some newer additions to the Apache CMS which allows 
to pull from svn and/or git. Will try something on that basis. If it works we 
can enhance the example package.

BR
Maruan

Am 05.07.2014 um 18:45 schrieb John Hewson j...@jahewson.com:

 I'm for runnable examples in trunk on SVN, otherwise we'll end up with code 
 that doesn't actually run. Some snippets from these examples could be put on 
 the website but they should always link back to the example file in SVN 
 viewvc - there's nothing more frustrating for a new user than incomplete 
 examples, or having to copy and paste snippets together to recreate an 
 example file.
 
 Looking at the examples we have currently on SVN the coding conventions used 
 are starting to look a bit dated, certainly far behind more recently written 
 code.
 
 -- John
 
 On 5 Jul 2014, at 04:46, Maruan Sahyoun sahy...@fileaffairs.de wrote:
 
 Hi,
 
 I have the infrastructure for enhancing our documentation nearly sorted 
 (needed to learn a little more about the possibilities of the Apache CMS). 
 Now WDYT would be the expectation for documenting how to use PDFBox for 
 different use cases - code snippets or runnable examples?
 
 BR
 Maruan



Re: PDFBox and documentation

2014-07-05 Thread Tilman Hausherr

Am 05.07.2014 18:45, schrieb John Hewson:

I'm for runnable examples in trunk on SVN, otherwise we'll end up with code 
that doesn't actually run. Some snippets from these examples could be put on 
the website but they should always link back to the example file in SVN viewvc 
- there's nothing more frustrating for a new user than incomplete examples, or 
having to copy and paste snippets together to recreate an example file.


Maybe the best is both. Sadly, I don't remember how I wrote my first 
pdfbox application - probably both from the pdfbox website and on 
stackoverflow. But I don't remember any pain.


One thing to improve in the documentation might be to tell that one 
could just download the app, instead of downloading pdfbox, fontbox and 
jempbox each time.




Looking at the examples we have currently on SVN the coding conventions used 
are starting to look a bit dated, certainly far behind more recently written 
code.



Well, I thought I respect the conventions :-)

Tilman


Re: Regression Testing

2014-07-05 Thread Tilman Hausherr

Am 04.07.2014 19:39, schrieb John Hewson:

Hi Tilman

Thanks for your thoughts, I think that your concerns are already covered by my 
original proposal, I’ll try to explain why and how:


Of course I agree with the need for regression tests, however it isn't easy: besides the problems 
of the different JDKs (I use JDK7 Windows 64 bit), there is the problem that some enhancements 
create slight changes in rendering that are not errors, i.e. both the before and the 
after files look OK by itself. This has happened when we changed the text rendering 
recently, and has happened again when the clipping was improved. The cause are probably slight 
changes in color or in boundaries.

If a rendering has changed then the regression test should fail. When a failure 
occurs the developer needs to manually inspect the differences (we could 
generate a visual diff which highlights what changed to make this easier) and 
if ok then they can replace the known-good PNG with the ones just rendered. 
Indeed this will be the basic workflow for working with regression tests.


Thats exactly what I do now, I generate a visual diff and I make a 
decision whether it is relevant or not. If I think not, then I replace 
the PNG.





Copyrights is a problem: I'm testing mostly with JIRA attachments that I've downloaded over the 
years. While uploading such files to JIRA might count as fair use, I doubt that this would still be 
true if they are included in a distribution. Instead, they should be stored somewhere on Apache 
servers where only committers and build software (Travis, Jenkins, ...) can 
access then. The public PDFs that Maruan mentions don't possibly have all the Problem cases that we 
solved before. However I have started working with these files and there are at least 5 recent 
issues that deals with them.

The PDFs won’t be in a distribution. They will just happen to be stored in an 
SVN repo but not our source code repo, in the same way that the website is 
stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law doesn’t 
distinguish between JIRA and SVN, both are publicly available via HTTP, so 
using SVN will simply be a continuation of what we’re already doing with JIRA.

The crucial factor is that we’re only storing publicly available PDFs,  because 
we have the right to do so, just like Google’s cache, and like we currently do 
with JIRA.


Yes but many PDFs we got aren't really public. If this svn repo is 
only accessible to committers, and if the publicly available build 
scripts won't break because of this, then it is OK.


Note that even if something is publicly available, it may still be 
copyrighted. Other risks can be that some people upload PDFs that 
include personal data. One really good test PDF was apparently a loan 
application. I remember that the user insisted that 1. it was test data, 
and 2. that it be removed.


Tilman


Additionally, the PDFs need to be version controlled otherwise we won’t be able 
to reliably recreate previous builds, so storing the files on a web server 
won’t be practical. Also committers will frequently be updating the renderings 
as bugs are fixed and we’ll need to version-control the rendered PNG files for 
the same reason. Finally, having committers-only files doesn’t fit well with 
the Apache goal of open development and would be unnecessary anyway given that 
all the PDFs are to be taken from public sources only.

In summary, I’m proposing that we just keep doing what we’re currently doing 
with JIRA but we move it into its own SVN repo along with some pre-rendered 
PNGs.


Re preflight: the default mode should be to have the Isartor tests on. 
Individuals could still disable them locally, but the central build software 
should always use them.

Yes - does anybody know why this isn’t the default?

-- John




Re: Regression Testing

2014-07-05 Thread John Hewson

 Copyrights is a problem: I'm testing mostly with JIRA attachments that I've 
 downloaded over the years. While uploading such files to JIRA might count 
 as fair use, I doubt that this would still be true if they are included in 
 a distribution. Instead, they should be stored somewhere on Apache servers 
 where only committers and build software (Travis, Jenkins, ...) can 
 access then. The public PDFs that Maruan mentions don't possibly have all 
 the Problem cases that we solved before. However I have started working 
 with these files and there are at least 5 recent issues that deals with 
 them.
 The PDFs won’t be in a distribution. They will just happen to be stored in 
 an SVN repo but not our source code repo, in the same way that the website 
 is stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law 
 doesn’t distinguish between JIRA and SVN, both are publicly available via 
 HTTP, so using SVN will simply be a continuation of what we’re already doing 
 with JIRA.
 
 The crucial factor is that we’re only storing publicly available PDFs,  
 because we have the right to do so, just like Google’s cache, and like we 
 currently do with JIRA.
 
 Yes but many PDFs we got aren't really public. If this svn repo is only 
 accessible to committers, and if the publicly available build scripts won't 
 break because of this, then it is OK.

Any non-public PDFs will not be permitted in our test suite, just as they 
shouldn't be on JIRA.

 Note that even if something is publicly available, it may still be 
 copyrighted. Other risks can be that some people upload PDFs that include 
 personal data. One really good test PDF was apparently a loan application. I 
 remember that the user insisted that 1. it was test data, and 2. that it be 
 removed.

All Apache development should be in the open, this is a key ASF principle, 
having a committers-only test suite is basically a no-no. It's important to 
understand that fair use allows us to use copyrighted works - this is 
expressly permitted, it's the same legal principle as Google’s cache. There is 
no need to seek permission. This is what we’ve been doing with JIRA already for 
years, so we are already doing this - it’s fine.

Naturally, if anybody objects to their PDF being in our test suite, we can 
always remove it, but it shouldn’t include anything which isn’t already on the 
public web.

-- John

Re: Regression Testing

2014-07-05 Thread Tilman Hausherr

Am 05.07.2014 22:12, schrieb John Hewson:

Copyrights is a problem: I'm testing mostly with JIRA attachments that I've downloaded over the 
years. While uploading such files to JIRA might count as fair use, I doubt that this would still be 
true if they are included in a distribution. Instead, they should be stored somewhere on Apache 
servers where only committers and build software (Travis, Jenkins, ...) can 
access then. The public PDFs that Maruan mentions don't possibly have all the Problem cases that we 
solved before. However I have started working with these files and there are at least 5 recent 
issues that deals with them.

The PDFs won’t be in a distribution. They will just happen to be stored in an 
SVN repo but not our source code repo, in the same way that the website is 
stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law doesn’t 
distinguish between JIRA and SVN, both are publicly available via HTTP, so 
using SVN will simply be a continuation of what we’re already doing with JIRA.

The crucial factor is that we’re only storing publicly available PDFs,  because 
we have the right to do so, just like Google’s cache, and like we currently do 
with JIRA.

Yes but many PDFs we got aren't really public. If this svn repo is only 
accessible to committers, and if the publicly available build scripts won't break because 
of this, then it is OK.

Any non-public PDFs will not be permitted in our test suite, just as they 
shouldn't be on JIRA.


Note that even if something is publicly available, it may still be 
copyrighted. Other risks can be that some people upload PDFs that include personal data. 
One really good test PDF was apparently a loan application. I remember that the user 
insisted that 1. it was test data, and 2. that it be removed.

All Apache development should be in the open, this is a key ASF principle, having a 
committers-only test suite is basically a no-no. It's important to understand that 
fair use allows us to use copyrighted works - this is expressly permitted, 
it's the same legal principle as Google’s cache. There is no need to seek permission. 
This is what we’ve been doing with JIRA already for years, so we are already doing this - 
it’s fine.


The problem is that this has all happened before. A few years ago, many 
files were deleted, see PDFBOX-391.


Tilman



Naturally, if anybody objects to their PDF being in our test suite, we can 
always remove it, but it shouldn’t include anything which isn’t already on the 
public web.

-- John




Re: Regression Testing

2014-07-05 Thread John Hewson

On 5 Jul 2014, at 13:47, Tilman Hausherr thaush...@t-online.de wrote:

 Am 05.07.2014 22:12, schrieb John Hewson:
 Copyrights is a problem: I'm testing mostly with JIRA attachments that 
 I've downloaded over the years. While uploading such files to JIRA might 
 count as fair use, I doubt that this would still be true if they are 
 included in a distribution. Instead, they should be stored somewhere on 
 Apache servers where only committers and build software (Travis, 
 Jenkins, ...) can access then. The public PDFs that Maruan mentions 
 don't possibly have all the Problem cases that we solved before. However 
 I have started working with these files and there are at least 5 recent 
 issues that deals with them.
 The PDFs won’t be in a distribution. They will just happen to be stored in 
 an SVN repo but not our source code repo, in the same way that the website 
 is stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law 
 doesn’t distinguish between JIRA and SVN, both are publicly available via 
 HTTP, so using SVN will simply be a continuation of what we’re already 
 doing with JIRA.
 
 The crucial factor is that we’re only storing publicly available PDFs,  
 because we have the right to do so, just like Google’s cache, and like we 
 currently do with JIRA.
 Yes but many PDFs we got aren't really public. If this svn repo is only 
 accessible to committers, and if the publicly available build scripts won't 
 break because of this, then it is OK.
 Any non-public PDFs will not be permitted in our test suite, just as they 
 shouldn't be on JIRA.
 
 Note that even if something is publicly available, it may still be 
 copyrighted. Other risks can be that some people upload PDFs that include 
 personal data. One really good test PDF was apparently a loan application. 
 I remember that the user insisted that 1. it was test data, and 2. that it 
 be removed.
 All Apache development should be in the open, this is a key ASF principle, 
 having a committers-only test suite is basically a no-no. It's important to 
 understand that fair use allows us to use copyrighted works - this is 
 expressly permitted, it's the same legal principle as Google’s cache. There 
 is no need to seek permission. This is what we’ve been doing with JIRA 
 already for years, so we are already doing this - it’s fine.
 
 The problem is that this has all happened before. A few years ago, many files 
 were deleted, see PDFBOX-391.

That issue is about including files in the source code repo as part of the 
PDFBox distribution, where there is a need to put files under an Apache 2.0 
compatible license. What I’m advocating is keeping a separate public repository 
of test files which are not a part of the PDFBox source, like we currently have 
on JIRA.

-- John