[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052814#comment-14052814 ] Tilman Hausherr commented on PDFBOX-1915: - Have a look at PDFBOX-2117, but please don't read the dialog yet, only look at the attached PDF files, not the java files, and not the dialog after them. This issue was opened by power user Petr who has done a lot for the project in the last few weeks but who didn't know about GSoC2014. I would ask you to first use the profiler with the files, to look at the source code for optimization possibilities. Maybe you'll have the same ideas as in the java files / the dialog there, maybe you have different ones. Note that optimization may not only have to be done in the shading package, it is also possibly in the function package. Most is function type 2. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you
[jira] [Comment Edited] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052814#comment-14052814 ] Tilman Hausherr edited comment on PDFBOX-1915 at 7/5/14 7:00 AM: - Have a look at PDFBOX-2117, but please don't read the dialog yet, only look at the attached PDF files, not the java files, and not the dialog after them. This issue was opened by power user Petr who has done a lot for the project in the last few weeks but who didn't know about GSoC2014. I would ask you to first use the profiler with the files, to look at the existing source code for optimization possibilities. Maybe you'll have the same ideas as in the java files / the dialog there, maybe you have different ones. Note that optimization may not only have to be done in the shading package, it is also possibly in the function package. Most is function type 2. was (Author: tilman): Have a look at PDFBOX-2117, but please don't read the dialog yet, only look at the attached PDF files, not the java files, and not the dialog after them. This issue was opened by power user Petr who has done a lot for the project in the last few weeks but who didn't know about GSoC2014. I would ask you to first use the profiler with the files, to look at the source code for optimization possibilities. Maybe you'll have the same ideas as in the java files / the dialog there, maybe you have different ones. Note that optimization may not only have to be done in the shading package, it is also possibly in the function package. Most is function type 2. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read
[jira] [Updated] (PDFBOX-2117) AxialShadingContext is slow
[ https://issues.apache.org/jira/browse/PDFBOX-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2117: Attachment: Shading2Function2.ps Shading2Function2.pdf AxialShadingContext is slow --- Key: PDFBOX-2117 URL: https://issues.apache.org/jira/browse/PDFBOX-2117 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, AxialShading.patch, AxialShading1.patch, AxialShadingContext.java.getrgbimage, Shading2Function2.pdf, Shading2Function2.ps, Shading2Function2text.pdf, asy-shade.pdf, color_gradient.pdf, shading_pattern.pdf AxialShadingContext#getRaster() is on top of profiler hot spots in documents that use an axial shading. Inside it, the slowest part is calling PDColorSpaceRGB#toRGB() and PDFunctionType3#eval() (in this order). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2117) AxialShadingContext is slow
[ https://issues.apache.org/jira/browse/PDFBOX-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2117: Issue Type: Sub-task (was: Improvement) Parent: PDFBOX-1915 AxialShadingContext is slow --- Key: PDFBOX-2117 URL: https://issues.apache.org/jira/browse/PDFBOX-2117 Project: PDFBox Issue Type: Sub-task Components: Rendering Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, AxialShading.patch, AxialShading1.patch, AxialShadingContext.java.getrgbimage, Shading2Function2.pdf, Shading2Function2.ps, Shading2Function2text.pdf, asy-shade.pdf, color_gradient.pdf, shading_pattern.pdf AxialShadingContext#getRaster() is on top of profiler hot spots in documents that use an axial shading. Inside it, the slowest part is calling PDColorSpaceRGB#toRGB() and PDFunctionType3#eval() (in this order). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052814#comment-14052814 ] Tilman Hausherr edited comment on PDFBOX-1915 at 7/5/14 8:15 AM: - Have a look at PDFBOX-2117, but please don't read the dialog yet, only look at the attached PDF files, not the java files, and not the dialog after them. This issue was opened by power user Petr who has done a lot for the project in the last few weeks but who didn't know about GSoC2014. I would ask you to first use the profiler with the files, to look at the existing source code for optimization possibilities. Maybe you'll have the same ideas as in the java files / the dialog there, maybe you have different ones. Note that optimization may not only have to be done in the shading package, it is also possibly in the function package. Most is function type 2. I've added an example with its postscript file in that issue, this makes it easier to understand what this function type 2 is about, even if you don't know postscript. The C0 and C1 values are boundaries, the three values are color values (in that case, R G B, but it could also be C M Y K or whatever the colorspace is) and N is an exponent. It is explained in the PDF spec but also here in short: http://ipdfdev.com/2012/09/03/ifxpdffactory-part-3-pdf-functions/ was (Author: tilman): Have a look at PDFBOX-2117, but please don't read the dialog yet, only look at the attached PDF files, not the java files, and not the dialog after them. This issue was opened by power user Petr who has done a lot for the project in the last few weeks but who didn't know about GSoC2014. I would ask you to first use the profiler with the files, to look at the existing source code for optimization possibilities. Maybe you'll have the same ideas as in the java files / the dialog there, maybe you have different ones. Note that optimization may not only have to be done in the shading package, it is also possibly in the function package. Most is function type 2. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String
Re: Regression Testing
Hi Tilman Thanks for your thoughts, I think that your concerns are already covered by my original proposal, I’ll try to explain why and how: Of course I agree with the need for regression tests, however it isn't easy: besides the problems of the different JDKs (I use JDK7 Windows 64 bit), there is the problem that some enhancements create slight changes in rendering that are not errors, i.e. both the before and the after files look OK by itself. This has happened when we changed the text rendering recently, and has happened again when the clipping was improved. The cause are probably slight changes in color or in boundaries. If a rendering has changed then the regression test should fail. When a failure occurs the developer needs to manually inspect the differences (we could generate a visual diff which highlights what changed to make this easier) and if ok then they can replace the known-good PNG with the ones just rendered. Indeed this will be the basic workflow for working with regression tests. I think this is the only way to handle that situation. The same applies for text extraction etc. - If an improvement changes the results the ‚base‘ needs to be reset by adding the new image, text etc as the validation source. A basic testbed could also run against other JDKs - e.g. wo validating against the know-good files - so we pick up potential issues early. Should be easy with Jenkins and treated as a hint. Copyrights is a problem: I'm testing mostly with JIRA attachments that I've downloaded over the years. While uploading such files to JIRA might count as fair use, I doubt that this would still be true if they are included in a distribution. Instead, they should be stored somewhere on Apache servers where only committers and build software (Travis, Jenkins, ...) can access then. The public PDFs that Maruan mentions don't possibly have all the Problem cases that we solved before. However I have started working with these files and there are at least 5 recent issues that deals with them. The PDFs won’t be in a distribution. They will just happen to be stored in an SVN repo but not our source code repo, in the same way that the website is stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law doesn’t distinguish between JIRA and SVN, both are publicly available via HTTP, so using SVN will simply be a continuation of what we’re already doing with JIRA. The crucial factor is that we’re only storing publicly available PDFs, because we have the right to do so, just like Google’s cache, and like we currently do with JIRA. Additionally, the PDFs need to be version controlled otherwise we won’t be able to reliably recreate previous builds, so storing the files on a web server won’t be practical. Also committers will frequently be updating the renderings as bugs are fixed and we’ll need to version-control the rendered PNG files for the same reason. Finally, having committers-only files doesn’t fit well with the Apache goal of open development and would be unnecessary anyway given that all the PDFs are to be taken from public sources only. In summary, I’m proposing that we just keep doing what we’re currently doing with JIRA but we move it into its own SVN repo along with some pre-rendered PNGs. In addition if we put in workarounds to handle nonconforming PDFs there should be a unit test added to make sure that we don’t break that e.g. when rewriting the parser. Re preflight: the default mode should be to have the Isartor tests on. Individuals could still disable them locally, but the central build software should always use them. Yes - does anybody know why this isn’t the default? No. +1 for enabling it per default -- John
PDFBox and documentation
Hi, I have the infrastructure for enhancing our documentation nearly sorted (needed to learn a little more about the possibilities of the Apache CMS). Now WDYT would be the expectation for documenting how to use PDFBox for different use cases - code snippets or runnable examples? BR Maruan
[jira] [Reopened] (PDFBOX-1695) Improve pdfbox tests
[ https://issues.apache.org/jira/browse/PDFBOX-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reopened PDFBOX-1695: - Assignee: Tilman Hausherr Reopen to commit my changes in TestPDFToImage. I have use this and improved it over the months and used to show regressions. Improve pdfbox tests Key: PDFBOX-1695 URL: https://issues.apache.org/jira/browse/PDFBOX-1695 Project: PDFBox Issue Type: Improvement Affects Versions: 1.8.2, 2.0.0 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Priority: Minor Labels: tdd, test-driven, testing Attachments: ccitt4.tif, jbig2test-01.png, jbig2test.pdf I'd like to improve the tests for rendering. org/apache/pdfbox/util/TestPDFToImage.java is disabled in pdfbox\pom.xml . This has been disabled since 2009 ?! So I enabled it here. The subdir rendering is missing in pdfbox\target\test-output for these tests When a test fails because the rendered image is not identical, no detailed message appears on the console. It appears only in pdfbox.log and not on the console. this is because of the settings in pdfbox\src\test\resources\logging.properties If this is on purpose, please change the texts in pdfbox\src\test\java\org\apache\pdfbox\util\*.java from One or more failures, see test log for details to One or more failures, see test logfile 'pdfbox.log' for details I wanted to attach a PDF with ccitt g4 compression and its rendering created with the 1.8.2 version, but it doesn't work out, seems that CIB generates files that can be rendered properly with 1.8.2. However I attach the TIFF g4 file, and a JBIG2 test file from it. I don't have access to a Xerox WorkCentre (enter jbig2 in google news :-) ) so I used a free service, so there's a watermark. It should be included into pdfbox\src\test\resources\input\rendering I have created the image myself and I give it into the public domain. If my suggestion is accepted, it would be nice if people could create files that fail in current versions or have failed in old versions, and release these files to the public domain, so that they can be added to the tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
Build failed in Jenkins: PDFBox-ant #1411
See https://builds.apache.org/job/PDFBox-ant/1411/changes Changes: [tilman] PDFBOX-1695: create graphic diff file, and empty files in target dir if anything goes wrong -- [...truncated 1481 lines...] AU preflight/src/main/java/org/apache/pdfbox/preflight/font/descriptor/TrueTypeDescriptorHelper.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/descriptor/Type1DescriptorHelper.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/descriptor/CIDType2DescriptorHelper.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/Type1FontValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/CIDType2FontValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/Type3FontValidator.java A preflight/src/main/java/org/apache/pdfbox/preflight/font/container AU preflight/src/main/java/org/apache/pdfbox/preflight/font/container/CIDType0Container.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/container/TrueTypeContainer.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/container/Type0Container.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/container/Type1Container.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/container/CIDType2Container.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/container/Type3Container.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/container/FontContainer.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/DescendantFontValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/font/SimpleFontValidator.java AUpreflight/src/main/java/org/apache/pdfbox/preflight/Format.java AU preflight/src/main/java/org/apache/pdfbox/preflight/PreflightContext.java AU preflight/src/main/java/org/apache/pdfbox/preflight/ValidationResult.java A preflight/src/main/java/org/apache/pdfbox/preflight/annotation AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/LinkAnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/TrapNetAnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/AnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/MarkupAnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/InkAnnotationValdiator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/SquareCircleAnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/PopupAnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/WidgetAnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/TextAnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/FreeTextAnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/RubberStampAnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/AnnotationValidatorFactory.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/LineAnnotationValidator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/PrintMarkAnnotationValidator.java A preflight/src/main/java/org/apache/pdfbox/preflight/annotation/pdfa AU preflight/src/main/java/org/apache/pdfbox/preflight/annotation/pdfa/PDFAbAnnotationFactory.java A preflight/src/main/java/org/apache/pdfbox/preflight/content AU preflight/src/main/java/org/apache/pdfbox/preflight/content/ContentStreamException.java AU preflight/src/main/java/org/apache/pdfbox/preflight/content/PreflightStreamEngine.java AU preflight/src/main/java/org/apache/pdfbox/preflight/content/StubOperator.java AU preflight/src/main/java/org/apache/pdfbox/preflight/content/PreflightContentStream.java A preflight/src/main/java/org/apache/pdfbox/preflight/action AU preflight/src/main/java/org/apache/pdfbox/preflight/action/AbstractActionManager.java AU preflight/src/main/java/org/apache/pdfbox/preflight/action/UriAction.java AU preflight/src/main/java/org/apache/pdfbox/preflight/action/UndefAction.java AU preflight/src/main/java/org/apache/pdfbox/preflight/action/ActionManagerFactory.java AU preflight/src/main/java/org/apache/pdfbox/preflight/action/SubmitAction.java AU preflight/src/main/java/org/apache/pdfbox/preflight/action/GoToRemoteAction.java AU preflight/src/main/java/org/apache/pdfbox/preflight/action/NamedAction.java AU
Re: PDFBox and documentation
that should be doable with some newer additions to the Apache CMS which allows to pull from svn and/or git. Will try something on that basis. If it works we can enhance the example package. BR Maruan Am 05.07.2014 um 18:45 schrieb John Hewson j...@jahewson.com: I'm for runnable examples in trunk on SVN, otherwise we'll end up with code that doesn't actually run. Some snippets from these examples could be put on the website but they should always link back to the example file in SVN viewvc - there's nothing more frustrating for a new user than incomplete examples, or having to copy and paste snippets together to recreate an example file. Looking at the examples we have currently on SVN the coding conventions used are starting to look a bit dated, certainly far behind more recently written code. -- John On 5 Jul 2014, at 04:46, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi, I have the infrastructure for enhancing our documentation nearly sorted (needed to learn a little more about the possibilities of the Apache CMS). Now WDYT would be the expectation for documenting how to use PDFBox for different use cases - code snippets or runnable examples? BR Maruan
Re: PDFBox and documentation
Am 05.07.2014 18:45, schrieb John Hewson: I'm for runnable examples in trunk on SVN, otherwise we'll end up with code that doesn't actually run. Some snippets from these examples could be put on the website but they should always link back to the example file in SVN viewvc - there's nothing more frustrating for a new user than incomplete examples, or having to copy and paste snippets together to recreate an example file. Maybe the best is both. Sadly, I don't remember how I wrote my first pdfbox application - probably both from the pdfbox website and on stackoverflow. But I don't remember any pain. One thing to improve in the documentation might be to tell that one could just download the app, instead of downloading pdfbox, fontbox and jempbox each time. Looking at the examples we have currently on SVN the coding conventions used are starting to look a bit dated, certainly far behind more recently written code. Well, I thought I respect the conventions :-) Tilman
Re: Regression Testing
Am 04.07.2014 19:39, schrieb John Hewson: Hi Tilman Thanks for your thoughts, I think that your concerns are already covered by my original proposal, I’ll try to explain why and how: Of course I agree with the need for regression tests, however it isn't easy: besides the problems of the different JDKs (I use JDK7 Windows 64 bit), there is the problem that some enhancements create slight changes in rendering that are not errors, i.e. both the before and the after files look OK by itself. This has happened when we changed the text rendering recently, and has happened again when the clipping was improved. The cause are probably slight changes in color or in boundaries. If a rendering has changed then the regression test should fail. When a failure occurs the developer needs to manually inspect the differences (we could generate a visual diff which highlights what changed to make this easier) and if ok then they can replace the known-good PNG with the ones just rendered. Indeed this will be the basic workflow for working with regression tests. Thats exactly what I do now, I generate a visual diff and I make a decision whether it is relevant or not. If I think not, then I replace the PNG. Copyrights is a problem: I'm testing mostly with JIRA attachments that I've downloaded over the years. While uploading such files to JIRA might count as fair use, I doubt that this would still be true if they are included in a distribution. Instead, they should be stored somewhere on Apache servers where only committers and build software (Travis, Jenkins, ...) can access then. The public PDFs that Maruan mentions don't possibly have all the Problem cases that we solved before. However I have started working with these files and there are at least 5 recent issues that deals with them. The PDFs won’t be in a distribution. They will just happen to be stored in an SVN repo but not our source code repo, in the same way that the website is stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law doesn’t distinguish between JIRA and SVN, both are publicly available via HTTP, so using SVN will simply be a continuation of what we’re already doing with JIRA. The crucial factor is that we’re only storing publicly available PDFs, because we have the right to do so, just like Google’s cache, and like we currently do with JIRA. Yes but many PDFs we got aren't really public. If this svn repo is only accessible to committers, and if the publicly available build scripts won't break because of this, then it is OK. Note that even if something is publicly available, it may still be copyrighted. Other risks can be that some people upload PDFs that include personal data. One really good test PDF was apparently a loan application. I remember that the user insisted that 1. it was test data, and 2. that it be removed. Tilman Additionally, the PDFs need to be version controlled otherwise we won’t be able to reliably recreate previous builds, so storing the files on a web server won’t be practical. Also committers will frequently be updating the renderings as bugs are fixed and we’ll need to version-control the rendered PNG files for the same reason. Finally, having committers-only files doesn’t fit well with the Apache goal of open development and would be unnecessary anyway given that all the PDFs are to be taken from public sources only. In summary, I’m proposing that we just keep doing what we’re currently doing with JIRA but we move it into its own SVN repo along with some pre-rendered PNGs. Re preflight: the default mode should be to have the Isartor tests on. Individuals could still disable them locally, but the central build software should always use them. Yes - does anybody know why this isn’t the default? -- John
Re: Regression Testing
Copyrights is a problem: I'm testing mostly with JIRA attachments that I've downloaded over the years. While uploading such files to JIRA might count as fair use, I doubt that this would still be true if they are included in a distribution. Instead, they should be stored somewhere on Apache servers where only committers and build software (Travis, Jenkins, ...) can access then. The public PDFs that Maruan mentions don't possibly have all the Problem cases that we solved before. However I have started working with these files and there are at least 5 recent issues that deals with them. The PDFs won’t be in a distribution. They will just happen to be stored in an SVN repo but not our source code repo, in the same way that the website is stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law doesn’t distinguish between JIRA and SVN, both are publicly available via HTTP, so using SVN will simply be a continuation of what we’re already doing with JIRA. The crucial factor is that we’re only storing publicly available PDFs, because we have the right to do so, just like Google’s cache, and like we currently do with JIRA. Yes but many PDFs we got aren't really public. If this svn repo is only accessible to committers, and if the publicly available build scripts won't break because of this, then it is OK. Any non-public PDFs will not be permitted in our test suite, just as they shouldn't be on JIRA. Note that even if something is publicly available, it may still be copyrighted. Other risks can be that some people upload PDFs that include personal data. One really good test PDF was apparently a loan application. I remember that the user insisted that 1. it was test data, and 2. that it be removed. All Apache development should be in the open, this is a key ASF principle, having a committers-only test suite is basically a no-no. It's important to understand that fair use allows us to use copyrighted works - this is expressly permitted, it's the same legal principle as Google’s cache. There is no need to seek permission. This is what we’ve been doing with JIRA already for years, so we are already doing this - it’s fine. Naturally, if anybody objects to their PDF being in our test suite, we can always remove it, but it shouldn’t include anything which isn’t already on the public web. -- John
Re: Regression Testing
Am 05.07.2014 22:12, schrieb John Hewson: Copyrights is a problem: I'm testing mostly with JIRA attachments that I've downloaded over the years. While uploading such files to JIRA might count as fair use, I doubt that this would still be true if they are included in a distribution. Instead, they should be stored somewhere on Apache servers where only committers and build software (Travis, Jenkins, ...) can access then. The public PDFs that Maruan mentions don't possibly have all the Problem cases that we solved before. However I have started working with these files and there are at least 5 recent issues that deals with them. The PDFs won’t be in a distribution. They will just happen to be stored in an SVN repo but not our source code repo, in the same way that the website is stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law doesn’t distinguish between JIRA and SVN, both are publicly available via HTTP, so using SVN will simply be a continuation of what we’re already doing with JIRA. The crucial factor is that we’re only storing publicly available PDFs, because we have the right to do so, just like Google’s cache, and like we currently do with JIRA. Yes but many PDFs we got aren't really public. If this svn repo is only accessible to committers, and if the publicly available build scripts won't break because of this, then it is OK. Any non-public PDFs will not be permitted in our test suite, just as they shouldn't be on JIRA. Note that even if something is publicly available, it may still be copyrighted. Other risks can be that some people upload PDFs that include personal data. One really good test PDF was apparently a loan application. I remember that the user insisted that 1. it was test data, and 2. that it be removed. All Apache development should be in the open, this is a key ASF principle, having a committers-only test suite is basically a no-no. It's important to understand that fair use allows us to use copyrighted works - this is expressly permitted, it's the same legal principle as Google’s cache. There is no need to seek permission. This is what we’ve been doing with JIRA already for years, so we are already doing this - it’s fine. The problem is that this has all happened before. A few years ago, many files were deleted, see PDFBOX-391. Tilman Naturally, if anybody objects to their PDF being in our test suite, we can always remove it, but it shouldn’t include anything which isn’t already on the public web. -- John
Re: Regression Testing
On 5 Jul 2014, at 13:47, Tilman Hausherr thaush...@t-online.de wrote: Am 05.07.2014 22:12, schrieb John Hewson: Copyrights is a problem: I'm testing mostly with JIRA attachments that I've downloaded over the years. While uploading such files to JIRA might count as fair use, I doubt that this would still be true if they are included in a distribution. Instead, they should be stored somewhere on Apache servers where only committers and build software (Travis, Jenkins, ...) can access then. The public PDFs that Maruan mentions don't possibly have all the Problem cases that we solved before. However I have started working with these files and there are at least 5 recent issues that deals with them. The PDFs won’t be in a distribution. They will just happen to be stored in an SVN repo but not our source code repo, in the same way that the website is stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law doesn’t distinguish between JIRA and SVN, both are publicly available via HTTP, so using SVN will simply be a continuation of what we’re already doing with JIRA. The crucial factor is that we’re only storing publicly available PDFs, because we have the right to do so, just like Google’s cache, and like we currently do with JIRA. Yes but many PDFs we got aren't really public. If this svn repo is only accessible to committers, and if the publicly available build scripts won't break because of this, then it is OK. Any non-public PDFs will not be permitted in our test suite, just as they shouldn't be on JIRA. Note that even if something is publicly available, it may still be copyrighted. Other risks can be that some people upload PDFs that include personal data. One really good test PDF was apparently a loan application. I remember that the user insisted that 1. it was test data, and 2. that it be removed. All Apache development should be in the open, this is a key ASF principle, having a committers-only test suite is basically a no-no. It's important to understand that fair use allows us to use copyrighted works - this is expressly permitted, it's the same legal principle as Google’s cache. There is no need to seek permission. This is what we’ve been doing with JIRA already for years, so we are already doing this - it’s fine. The problem is that this has all happened before. A few years ago, many files were deleted, see PDFBOX-391. That issue is about including files in the source code repo as part of the PDFBox distribution, where there is a need to put files under an Apache 2.0 compatible license. What I’m advocating is keeping a separate public repository of test files which are not a part of the PDFBox source, like we currently have on JIRA. -- John