[jira] [Commented] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable
[ https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977879#comment-13977879 ] Andrei Solntsev commented on PDFBOX-2039: - No problems, the interface java.io.Closeable is available since Java 1.5 Class PDDocument should implement java.io.Closeable --- Key: PDFBOX-2039 URL: https://issues.apache.org/jira/browse/PDFBOX-2039 Project: PDFBox Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Andrei Solntsev Priority: Minor Original Estimate: 1h Remaining Estimate: 1h It would make it possible to use Java 7 try-with-resources feature: try (PDDocument doc = PDDocument.load(outputFile)) { // bla-bla // no need to call doc.close(); explicitly } P.S. Actually all org.apache.pdfbox.* classes with method close() could implement java.io.Closeable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable
[ https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2039: Affects Version/s: 1.8.5 1.8.4 Class PDDocument should implement java.io.Closeable --- Key: PDFBOX-2039 URL: https://issues.apache.org/jira/browse/PDFBOX-2039 Project: PDFBox Issue Type: Improvement Affects Versions: 1.8.4, 1.8.5, 2.0.0 Reporter: Andrei Solntsev Priority: Minor Original Estimate: 1h Remaining Estimate: 1h It would make it possible to use Java 7 try-with-resources feature: try (PDDocument doc = PDDocument.load(outputFile)) { // bla-bla // no need to call doc.close(); explicitly } P.S. Actually all org.apache.pdfbox.* classes with method close() could implement java.io.Closeable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable
[ https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977504#comment-13977504 ] Tilman Hausherr edited comment on PDFBOX-2039 at 4/23/14 6:12 AM: -- -The 1.8 version must support JDK5, so it is not possible.- The 2.0 version has COSDocument and PDDocument that are Closeable. was (Author: tilman): The 1.8 version must support JDK5, so it is not possible. The 2.0 version has COSDocument and PDDocument that are Closeable. Class PDDocument should implement java.io.Closeable --- Key: PDFBOX-2039 URL: https://issues.apache.org/jira/browse/PDFBOX-2039 Project: PDFBox Issue Type: Improvement Affects Versions: 1.8.4, 1.8.5, 2.0.0 Reporter: Andrei Solntsev Priority: Minor Original Estimate: 1h Remaining Estimate: 1h It would make it possible to use Java 7 try-with-resources feature: try (PDDocument doc = PDDocument.load(outputFile)) { // bla-bla // no need to call doc.close(); explicitly } P.S. Actually all org.apache.pdfbox.* classes with method close() could implement java.io.Closeable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable
[ https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2039: Fix Version/s: 2.0.0 1.8.5 Class PDDocument should implement java.io.Closeable --- Key: PDFBOX-2039 URL: https://issues.apache.org/jira/browse/PDFBOX-2039 Project: PDFBox Issue Type: Improvement Affects Versions: 1.8.4, 1.8.5, 2.0.0 Reporter: Andrei Solntsev Priority: Minor Fix For: 1.8.5, 2.0.0 Original Estimate: 1h Remaining Estimate: 1h It would make it possible to use Java 7 try-with-resources feature: try (PDDocument doc = PDDocument.load(outputFile)) { // bla-bla // no need to call doc.close(); explicitly } P.S. Actually all org.apache.pdfbox.* classes with method close() could implement java.io.Closeable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable
[ https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977890#comment-13977890 ] Tilman Hausherr commented on PDFBOX-2039: - For a start, I added it for PDDocument and COSDocument in 1.8 in rev 1589346, because these are the methods where it is implemented in 2.0. I'll have a look at other methods later. You can get it immediately with svn if you want to test improved code, or here in a few hours: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.5-SNAPSHOT/ Class PDDocument should implement java.io.Closeable --- Key: PDFBOX-2039 URL: https://issues.apache.org/jira/browse/PDFBOX-2039 Project: PDFBox Issue Type: Improvement Affects Versions: 1.8.4, 1.8.5, 2.0.0 Reporter: Andrei Solntsev Priority: Minor Fix For: 1.8.5, 2.0.0 Original Estimate: 1h Remaining Estimate: 1h It would make it possible to use Java 7 try-with-resources feature: try (PDDocument doc = PDDocument.load(outputFile)) { // bla-bla // no need to call doc.close(); explicitly } P.S. Actually all org.apache.pdfbox.* classes with method close() could implement java.io.Closeable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2038) Method VisualSignatureParser#parse does not close COSDocument
[ https://issues.apache.org/jira/browse/PDFBOX-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2038: Affects Version/s: 2.0.0 1.8.5 Method VisualSignatureParser#parse does not close COSDocument - Key: PDFBOX-2038 URL: https://issues.apache.org/jira/browse/PDFBOX-2038 Project: PDFBox Issue Type: Bug Affects Versions: 1.8.4, 1.8.5, 2.0.0 Reporter: Andrei Solntsev Priority: Minor Original Estimate: 1h Remaining Estimate: 1h I am adding a visual signature to my PDF. SignatureOptions options = new SignatureOptions(); options.setVisualSignature( new FileInputStream(my.jpg) ); After a while I am getting the following warning in logs: Warning: COSDocument: You did not close a PDF Document The problem cause is probably the method org.apache.pdfbox.pdfparser.VisualSignatureParser#parse which creates instance of COSDocument, but does not close it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2038) Method VisualSignatureParser#parse does not close COSDocument
[ https://issues.apache.org/jira/browse/PDFBOX-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977904#comment-13977904 ] Tilman Hausherr commented on PDFBOX-2038: - setVisualSignature does create a COSDocument, which contains the visual signature that is to be used later. So it can't be closed immediately. Currently, all you can do is to call options.getVisualSignature() and close that object. An improvement might be to add a close() method to SignatureOptions, but I'm not one of the signature people here, so I'd rather wait for their opinion. Method VisualSignatureParser#parse does not close COSDocument - Key: PDFBOX-2038 URL: https://issues.apache.org/jira/browse/PDFBOX-2038 Project: PDFBox Issue Type: Bug Affects Versions: 1.8.4, 1.8.5, 2.0.0 Reporter: Andrei Solntsev Priority: Minor Original Estimate: 1h Remaining Estimate: 1h I am adding a visual signature to my PDF. SignatureOptions options = new SignatureOptions(); options.setVisualSignature( new FileInputStream(my.jpg) ); After a while I am getting the following warning in logs: Warning: COSDocument: You did not close a PDF Document The problem cause is probably the method org.apache.pdfbox.pdfparser.VisualSignatureParser#parse which creates instance of COSDocument, but does not close it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2041) Convert PDF to Image (Strange Color)
ahfei created PDFBOX-2041: - Summary: Convert PDF to Image (Strange Color) Key: PDFBOX-2041 URL: https://issues.apache.org/jira/browse/PDFBOX-2041 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.4 Environment: Java(1.7.0_45), OS (Ubuntu) Reporter: ahfei Using PDFBox, tried to convert PDF to Image file (case1.pdf, case1.jpg) Below is code i'm using : BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200); ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 200); After convert, this image isn't look like pdf. Half page of it become blue and black color. Attached images PDF. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2041) Convert PDF to Image (Strange Color)
[ https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ahfei updated PDFBOX-2041: -- Description: Using PDFBox, tried to convert PDF to Image file (case1.pdf, case1.jpg) Below is code i'm using : BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200); ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 200); After convert, this image isn't look like pdf. Half page of it become blue and black color. Attached images PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri was: Using PDFBox, tried to convert PDF to Image file (case1.pdf, case1.jpg) Below is code i'm using : BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200); ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 200); After convert, this image isn't look like pdf. Half page of it become blue and black color. Attached images PDF. Convert PDF to Image (Strange Color) Key: PDFBOX-2041 URL: https://issues.apache.org/jira/browse/PDFBOX-2041 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.4 Environment: Java(1.7.0_45), OS (Ubuntu) Reporter: ahfei Using PDFBox, tried to convert PDF to Image file (case1.pdf, case1.jpg) Below is code i'm using : BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200); ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 200); After convert, this image isn't look like pdf. Half page of it become blue and black color. Attached images PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1241) Better handle of missing offset at the end of a file
[ https://issues.apache.org/jira/browse/PDFBOX-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978008#comment-13978008 ] Manuel Mahringer commented on PDFBOX-1241: -- With the trunk version from 22.04.2014 the issue isn't reproduceable anymore. Better handle of missing offset at the end of a file Key: PDFBOX-1241 URL: https://issues.apache.org/jira/browse/PDFBOX-1241 Project: PDFBox Issue Type: Improvement Components: Parsing, Text extraction Affects Versions: 1.6.0 Environment: All platforms affected Reporter: Ernst Eibensteiner Attachments: On the Insert tab.pdf We came across PDF files that do not have an offset at the end of the file. This leads to the following exeption: c:\tmp java -jar pdfbox-app-1.6.0.jar ExtractText -endPage 1 On the Insert tab.pdf ExtractText failed with the following exception: java.io.IOException: Error: Expected an integer type, actual='' at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384) at org.apache.pdfbox.pdfparser.PDFParser.parseStartXref(PDFParser.java:6 63) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:464) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:978) at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:196) at org.apache.pdfbox.ExtractText.main(ExtractText.java:76) at org.apache.pdfbox.PDFBox.main(PDFBox.java:42) While these PDFs are non-conforming, it'd be an improvement to allow them to be read and processed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2042) ColorSpace without Range
Juraj Lonc created PDFBOX-2042: -- Summary: ColorSpace without Range Key: PDFBOX-2042 URL: https://issues.apache.org/jira/browse/PDFBOX-2042 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Reporter: Juraj Lonc I have PDF document where I am modifying PDPage content stream. Saved document is invalid (Adobe reader complains about it). I have narrowed it down to ColorSpace. Original document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 ] Modified document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 /Range [] ] When I manually remove /Range [] from PDF then Adobe reader opens it without an error. Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2042) ColorSpace without Range
[ https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2042: --- Attachment: pdfbox18.pdf Original (working) file. ColorSpace without Range Key: PDFBOX-2042 URL: https://issues.apache.org/jira/browse/PDFBOX-2042 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: pdfbox18.pdf I have PDF document where I am modifying PDPage content stream. Saved document is invalid (Adobe reader complains about it). I have narrowed it down to ColorSpace. Original document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 ] Modified document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 /Range [] ] When I manually remove /Range [] from PDF then Adobe reader opens it without an error. Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2042) ColorSpace without Range
[ https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2042: --- Attachment: pdfbox20.pdf Modified file in pdfbox 2.0.0 (error in Adobe Reader) ColorSpace without Range Key: PDFBOX-2042 URL: https://issues.apache.org/jira/browse/PDFBOX-2042 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: pdfbox18.pdf, pdfbox20.pdf I have PDF document where I am modifying PDPage content stream. Saved document is invalid (Adobe reader complains about it). I have narrowed it down to ColorSpace. Original document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 ] Modified document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 /Range [] ] When I manually remove /Range [] from PDF then Adobe reader opens it without an error. Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
community bonding period
Although I'm only mentoring Shaola, maybe some of it is useful for Dimuthu as well: From the mentors list: === We now are in the community bonding period [1] which lasts until May 19. During this period students should learn about your project, your release processes, the Apache Way, how we do things around here, interact with the community and close any knowledge gaps they might have. [1] http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html === Here's a FAQ about Apache: https://www.apache.org/foundation/faq.html IMHO most important are What is Apache about? and What is Apache not about?. (My personal addendum to that is Apache is not like Wikipedia. If you've ever edited in wikipedia, you'll notice the difference after a few days) https://www.apache.org/foundation/how-it-works.html The roles are simpler than in that text, all committers here are PMC members, and the PMC chair (Andreas) is also ASF member. Only committers and above have write access to the official PDFBOX repository. So the best would be to set up a copy on an open source repository. https://en.wikipedia.org/wiki/Comparison_of_open-source_software_hosting_facilities We're trying to be transparent. So stuff that deals with the implementation of the project should probably be in the ticket. To see what I mean, have a look at https://issues.apache.org/jira/browse/PDFBOX-615 and the related issues. PDFBOX-615 started with I will be trying to add this functionality this week but it became a huge effort by several people that ended 4 years later :-) See also John's remarks about my code. It annoyed me somewhat at the beginning, but at the end it resulted in much better code. Note that you can edit in JIRA. See an example here https://issues.apache.org/jira/browse/PDFBOX-2039 i.e. you can modify previous posts. Stuff that deals with PDFBOX in general is best in this (publicly readable) mailing list. The advantage is that others might answer you (if they want) when I'm working, sleeping, or not on the internet for whatever reason. Stuff that deals with java, svn and maven - e-mail me if you don't get the answer within a few minutes from google or from stackoverflow, i.e. don't waste time searching. Using other libraries: this is OK as long as they have an Apache license or a compatible license (GPL is not). However we don't use many libraries, everything is already big, so if you want, ask first. (Sorry if you already mentioned a library, will reread your proposal again later) Of course it is always OK to temporary use whatever you want to just test a theory / strategy / algorithm. Using other code: the code should rather be your own, but you can use small excerpts from stackoverflow.com etc but indicate it in your code with a link. Always comment in the code if you were inspired by other peoples code or algorithms or research papers, just look at the existing shading code for how I did it. Don't forget the Apache header in new modules. Your code should work on JDK5, so that we can use it in the 1.8 version too. So don't use diamond operators, lambda expressions or even String.isEmpty(). IDE: I recommend netbeans but you're free to use your own. Just make sure that svn (and whatever the hoster will use) and maven are integrated in it, this will make your life easier. A personal recommendation from my student days in the 80ies: don't work all night. Such code was usually found to be poor/worthless after I had the much needed sleep. Andreas: correct me if I forgot something. Tilman
Re: community bonding period
Great advice! -- John On 23 Apr 2014, at 15:49, Tilman Hausherr thaush...@t-online.de wrote: Although I'm only mentoring Shaola, maybe some of it is useful for Dimuthu as well: From the mentors list: === We now are in the community bonding period [1] which lasts until May 19. During this period students should learn about your project, your release processes, the Apache Way, how we do things around here, interact with the community and close any knowledge gaps they might have. [1] http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html === Here's a FAQ about Apache: https://www.apache.org/foundation/faq.html IMHO most important are What is Apache about? and What is Apache not about?. (My personal addendum to that is Apache is not like Wikipedia. If you've ever edited in wikipedia, you'll notice the difference after a few days) https://www.apache.org/foundation/how-it-works.html The roles are simpler than in that text, all committers here are PMC members, and the PMC chair (Andreas) is also ASF member. Only committers and above have write access to the official PDFBOX repository. So the best would be to set up a copy on an open source repository. https://en.wikipedia.org/wiki/Comparison_of_open-source_software_hosting_facilities We're trying to be transparent. So stuff that deals with the implementation of the project should probably be in the ticket. To see what I mean, have a look at https://issues.apache.org/jira/browse/PDFBOX-615 and the related issues. PDFBOX-615 started with I will be trying to add this functionality this week but it became a huge effort by several people that ended 4 years later :-) See also John's remarks about my code. It annoyed me somewhat at the beginning, but at the end it resulted in much better code. Note that you can edit in JIRA. See an example here https://issues.apache.org/jira/browse/PDFBOX-2039 i.e. you can modify previous posts. Stuff that deals with PDFBOX in general is best in this (publicly readable) mailing list. The advantage is that others might answer you (if they want) when I'm working, sleeping, or not on the internet for whatever reason. Stuff that deals with java, svn and maven - e-mail me if you don't get the answer within a few minutes from google or from stackoverflow, i.e. don't waste time searching. Using other libraries: this is OK as long as they have an Apache license or a compatible license (GPL is not). However we don't use many libraries, everything is already big, so if you want, ask first. (Sorry if you already mentioned a library, will reread your proposal again later) Of course it is always OK to temporary use whatever you want to just test a theory / strategy / algorithm. Using other code: the code should rather be your own, but you can use small excerpts from stackoverflow.com etc but indicate it in your code with a link. Always comment in the code if you were inspired by other peoples code or algorithms or research papers, just look at the existing shading code for how I did it. Don't forget the Apache header in new modules. Your code should work on JDK5, so that we can use it in the 1.8 version too. So don't use diamond operators, lambda expressions or even String.isEmpty(). IDE: I recommend netbeans but you're free to use your own. Just make sure that svn (and whatever the hoster will use) and maven are integrated in it, this will make your life easier. A personal recommendation from my student days in the 80ies: don't work all night. Such code was usually found to be poor/worthless after I had the much needed sleep. Andreas: correct me if I forgot something. Tilman
[jira] [Closed] (PDFBOX-1241) Better handle of missing offset at the end of a file
[ https://issues.apache.org/jira/browse/PDFBOX-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-1241. --- Resolution: Fixed Fix Version/s: 1.8.3 I tested with old versions, it failed until before 1.8.3. Since 1.8.3. has already been released, I assume I should close it and not just set it do resolved. Better handle of missing offset at the end of a file Key: PDFBOX-1241 URL: https://issues.apache.org/jira/browse/PDFBOX-1241 Project: PDFBox Issue Type: Improvement Components: Parsing, Text extraction Affects Versions: 1.6.0 Environment: All platforms affected Reporter: Ernst Eibensteiner Fix For: 1.8.3 Attachments: On the Insert tab.pdf We came across PDF files that do not have an offset at the end of the file. This leads to the following exeption: c:\tmp java -jar pdfbox-app-1.6.0.jar ExtractText -endPage 1 On the Insert tab.pdf ExtractText failed with the following exception: java.io.IOException: Error: Expected an integer type, actual='' at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384) at org.apache.pdfbox.pdfparser.PDFParser.parseStartXref(PDFParser.java:6 63) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:464) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:978) at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:196) at org.apache.pdfbox.ExtractText.main(ExtractText.java:76) at org.apache.pdfbox.PDFBox.main(PDFBox.java:42) While these PDFs are non-conforming, it'd be an improvement to allow them to be read and processed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PDFBOX-2039) Class PDDocument should implement java.io.Closeable
[ https://issues.apache.org/jira/browse/PDFBOX-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-2039. - Resolution: Fixed Assignee: Tilman Hausherr Done in rev 1589459 and 1589467 in the trunk and rev 1589465 in the 1.8 branch. Thanks for pointing me to this! Class PDDocument should implement java.io.Closeable --- Key: PDFBOX-2039 URL: https://issues.apache.org/jira/browse/PDFBOX-2039 Project: PDFBox Issue Type: Improvement Affects Versions: 1.8.4, 1.8.5, 2.0.0 Reporter: Andrei Solntsev Assignee: Tilman Hausherr Priority: Minor Fix For: 1.8.5, 2.0.0 Original Estimate: 1h Remaining Estimate: 1h It would make it possible to use Java 7 try-with-resources feature: try (PDDocument doc = PDDocument.load(outputFile)) { // bla-bla // no need to call doc.close(); explicitly } P.S. Actually all org.apache.pdfbox.* classes with method close() could implement java.io.Closeable -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: community bonding period
Hi Tilman, Thanks for the information. That helped me a lot. I'll work accordingly. On Wed, Apr 23, 2014 at 9:14 PM, John Hewson j...@jahewson.com wrote: Great advice! -- John On 23 Apr 2014, at 15:49, Tilman Hausherr thaush...@t-online.de wrote: Although I'm only mentoring Shaola, maybe some of it is useful for Dimuthu as well: From the mentors list: === We now are in the community bonding period [1] which lasts until May 19. During this period students should learn about your project, your release processes, the Apache Way, how we do things around here, interact with the community and close any knowledge gaps they might have. [1] http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html === Here's a FAQ about Apache: https://www.apache.org/foundation/faq.html IMHO most important are What is Apache about? and What is Apache not about?. (My personal addendum to that is Apache is not like Wikipedia. If you've ever edited in wikipedia, you'll notice the difference after a few days) https://www.apache.org/foundation/how-it-works.html The roles are simpler than in that text, all committers here are PMC members, and the PMC chair (Andreas) is also ASF member. Only committers and above have write access to the official PDFBOX repository. So the best would be to set up a copy on an open source repository. https://en.wikipedia.org/wiki/Comparison_of_open-source_software_hosting_facilities We're trying to be transparent. So stuff that deals with the implementation of the project should probably be in the ticket. To see what I mean, have a look at https://issues.apache.org/jira/browse/PDFBOX-615 and the related issues. PDFBOX-615 started with I will be trying to add this functionality this week but it became a huge effort by several people that ended 4 years later :-) See also John's remarks about my code. It annoyed me somewhat at the beginning, but at the end it resulted in much better code. Note that you can edit in JIRA. See an example here https://issues.apache.org/jira/browse/PDFBOX-2039 i.e. you can modify previous posts. Stuff that deals with PDFBOX in general is best in this (publicly readable) mailing list. The advantage is that others might answer you (if they want) when I'm working, sleeping, or not on the internet for whatever reason. Stuff that deals with java, svn and maven - e-mail me if you don't get the answer within a few minutes from google or from stackoverflow, i.e. don't waste time searching. Using other libraries: this is OK as long as they have an Apache license or a compatible license (GPL is not). However we don't use many libraries, everything is already big, so if you want, ask first. (Sorry if you already mentioned a library, will reread your proposal again later) Of course it is always OK to temporary use whatever you want to just test a theory / strategy / algorithm. Using other code: the code should rather be your own, but you can use small excerpts from stackoverflow.com etc but indicate it in your code with a link. Always comment in the code if you were inspired by other peoples code or algorithms or research papers, just look at the existing shading code for how I did it. Don't forget the Apache header in new modules. Your code should work on JDK5, so that we can use it in the 1.8 version too. So don't use diamond operators, lambda expressions or even String.isEmpty(). IDE: I recommend netbeans but you're free to use your own. Just make sure that svn (and whatever the hoster will use) and maven are integrated in it, this will make your life easier. A personal recommendation from my student days in the 80ies: don't work all night. Such code was usually found to be poor/worthless after I had the much needed sleep. Andreas: correct me if I forgot something. Tilman -- Regards W.Dimuthu Upeksha Undergraduate Department of Computer Science And Engineering University of Moratuwa, Sri Lanka
[jira] [Comment Edited] (PDFBOX-2041) Convert PDF to Image (Strange Color)
[ https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978953#comment-13978953 ] Tilman Hausherr edited comment on PDFBOX-2041 at 4/23/14 9:49 PM: -- 1. The PDF file is corrupt. A look at it with NOTEPAD++ shows %%EOF and then trash characters. Deleting all after that one makes the file much smaller, 518KB instead of 4,85MB. How did you get that file?! 2. I am able to render it. Your jpg file looks like it was cut off at some time. 3. The 2.0 version isn't able to open it with the non sequential parser, the sequential parser can open it. 4. The 1.8 version renders it fine, the 2.0 version has many glyphs missing, maybe a duplicate of PDFBOX-2037. I was able to render it with a modified 2.0 version that I use for myself. Convert PDF to Image (Strange Color) Key: PDFBOX-2041 URL: https://issues.apache.org/jira/browse/PDFBOX-2041 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.4 Environment: Java(1.7.0_45), OS (Ubuntu) Reporter: ahfei Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, pdfbox-2041.pdf-1-good.png Using PDFBox, tried to convert PDF to Image file (case1.pdf, case1.jpg) Below is code i'm using : BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200); ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 200); After convert, this image isn't look like pdf. Half page of it become blue and black color. Attached images PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2042) ColorSpace without Range
[ https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979011#comment-13979011 ] Tilman Hausherr commented on PDFBOX-2042: - It would be helpful if you provide the code that modifies the content stream. I couldn't reproduce the problem by just opening the document and saving it. But your theory about PDICCBased.getRangeArray(0) makes sense, that one would return an empty range array for the 0 parameter. ColorSpace without Range Key: PDFBOX-2042 URL: https://issues.apache.org/jira/browse/PDFBOX-2042 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: pdfbox18.pdf, pdfbox20.pdf I have PDF document where I am modifying PDPage content stream. Saved document is invalid (Adobe reader complains about it). I have narrowed it down to ColorSpace. Original document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 ] Modified document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 /Range [] ] When I manually remove /Range [] from PDF then Adobe reader opens it without an error. Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2041) Convert PDF to Image (Strange Color)
[ https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979239#comment-13979239 ] Tilman Hausherr commented on PDFBOX-2041: - I didn't mean to remove %%EOF, just everything after it. Could it be your Ubuntu disk is full? I don't have Ubuntu, so someone else will have to answer that. Convert PDF to Image (Strange Color) Key: PDFBOX-2041 URL: https://issues.apache.org/jira/browse/PDFBOX-2041 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.4 Environment: Java(1.7.0_45), OS (Ubuntu) Reporter: ahfei Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, pdfbox-2041.pdf-1-good.png Using PDFBox, tried to convert PDF to Image file (case1.pdf, case1.jpg) Below is code i'm using : BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200); ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 200); After convert, this image isn't look like pdf. Half page of it become blue and black color. Attached images PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri -- This message was sent by Atlassian JIRA (v6.2#6252)