Re: [DISCUSS] move documentation and examples to git
Dear Santosh, you can unregister using the link below. https://pdfbox.apache.org/mailinglists.html With kind regards Maruan Am 17.09.2014 um 03:00 schrieb Santosh Arakeri santosh.arak...@gmail.com: Pl dont send me mail. On 16 Sep 2014 13:52, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi there, in order to make it easier for people to contribute to the documentation and examples I thought about the potential benefits of moving these to a git based repository instead of svn. The main idea behind that is to allow people to contribute via github opening another channel of communication and making it easier to contribute. Proposed names are pdfbox-docs and pdfbox-examples. Take a look at https://github.com/apache/cordova-docs for an example of that. I haven’t thought about all potential implications and changes necessary yet but wanted to get a first feedback about support for that idea before putting more effort into that. WDYT? Maruan
Re: [DISCUSS] move documentation and examples to git
Hi, Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03 geschrieben: -1, I don't like the idea to have different repository types. Hmmm, is this just a I don't like it, but I can live with it or is it a clear veto? In a case of a veto, how about starting with moving parts of the docs to a new git repo? IMO sooner or later the project will move from svn to git and that would be a good opertunity to get used to the general usage of git and of course to the special processes used here at the ASF so that we are not thrown in at the deep end after the migration. Tilman BR Andreas Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun: Hi there, in order to make it easier for people to contribute to the documentation and examples I thought about the potential benefits of moving these to a git based repository instead of svn. The main idea behind that is to allow people to contribute via github opening another channel of communication and making it easier to contribute. Proposed names are pdfbox-docs and pdfbox-examples. Take a look at https://github.com/apache/cordova-docs for an example of that. I haven’t thought about all potential implications and changes necessary yet but wanted to get a first feedback about support for that idea before putting more effort into that. WDYT? Maruan
Re: test failures trunk
Hi, Ok. My issue was that I could not build while the PDFBox Jenkins has succesfull builds. So I was wondering why. But I run the tests in Eclipse, which ignores the test exclusions in the pom.xml. Now it is clear for me. Cornlelis Hoeflake 2014-09-15 20:44 GMT+02:00 John Hewson j...@jahewson.com: Yep, this test was failing back in January when I first encountered it. It uses some external certificate files, which could be the problem, but I really don’t know anything about it. -- John On 13 Sep 2014, at 08:32, Tilman Hausherr thaush...@t-online.de wrote: I had a look: - yes it fails in 2.0 but not in 1.8 - it still fails with the correction I mentioned is done (can't match the recipient) - the test was removed long ago which is why the builds don't fail - that part was written by one Benoit Guillon, who has moved on I don't know why the test was removed. John mentioned this before in PDFBOX-1825. I don't know enough about bc to understand what's wrong. Tilman Am 12.09.2014 um 10:49 schrieb Cornelis Hoeflake: Hi, I get exceptions when running the org.apache.pdfbox.encryption.TestPublicKeyEncryption tests. It is saying that the document is alread closed when calling the save method in reload. I get failures when running org.apache.pdfbox.util.TestPDFToImage, Is there something wrong with my test setup or are there more developers having these errors/failures? Met vriendelijke groet, Cornelis Hoeflake *Postex | post opnieuw uitgevonden *[t] 088 07 07 400 [m] 06 18684806 [w] www.postex.com http://www.postex.com/ Postex Nederland B.V. - Postbus 70466 - 1007 KL Amsterdam ** The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error.
[jira] [Created] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
Cornelis Hoeflake created PDFBOX-2357: - Summary: PDTrueTypeFont has no method to load font from stream Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
[ https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136873#comment-14136873 ] Cornelis Hoeflake commented on PDFBOX-2357: --- The method: /** * Loads a TTF to be embedded into a document. * * @param doc The PDF document that will hold the embedded font. * @param file a ttf file. * @return a PDTrueTypeFont instance. * @throws IOException If there is an error loading the data. */ public static PDTrueTypeFont loadTTF(PDDocument doc, InputStream is) throws IOException { return new PDTrueTypeFont(doc, is); } PDTrueTypeFont has no method to load font from stream - Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
[ https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136873#comment-14136873 ] Cornelis Hoeflake edited comment on PDFBOX-2357 at 9/17/14 7:02 AM: The method: {code:title=PDTrueTypeFont.java|borderStyle=solid} /** * Loads a TTF to be embedded into a document. * * @param doc The PDF document that will hold the embedded font. * @param file a ttf file. * @return a PDTrueTypeFont instance. * @throws IOException If there is an error loading the data. */ public static PDTrueTypeFont loadTTF(PDDocument doc, InputStream is) throws IOException { return new PDTrueTypeFont(doc, is); } {code} was (Author: c.hoeflake): The method: /** * Loads a TTF to be embedded into a document. * * @param doc The PDF document that will hold the embedded font. * @param file a ttf file. * @return a PDTrueTypeFont instance. * @throws IOException If there is an error loading the data. */ public static PDTrueTypeFont loadTTF(PDDocument doc, InputStream is) throws IOException { return new PDTrueTypeFont(doc, is); } PDTrueTypeFont has no method to load font from stream - Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
Cornelis Hoeflake created PDFBOX-2358: - Summary: ExternalFonts uses classloader of class in font-box Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
[ https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136940#comment-14136940 ] Cornelis Hoeflake commented on PDFBOX-2358: --- One solution is to pass a classloader to the ResourceLoader. But throwing around with classloaders can cause issues. Another solution is to load the file directly without the ResourceLoader, as far as I can see the resourceloader does nothing extra in this case. Same for CMapParser which also uses the ResourceLoader. Next is to remove ResourceLoader or to make very clear that the ResourceLoader cannot be used outside the fontbox. My personal opinion is that the ResourceLoader is a black class which does some magic tricks (for example, return a FileInputStream if the resource could not be found in the classloader, this is somewhat obscure and could be dangerous). ExternalFonts uses classloader of class in font-box --- Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.
[ https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137026#comment-14137026 ] Andreas Lehmkühler commented on PDFBOX-2301: [~jojelino] PDFBox 1.8.x has java 1.5 and 2.0.x has java 1.6 as minimum requirement, so that your patch isn't valid. However, we already started a refactoring in 2.0 and we don't need cloning anymore. I guess we won't fix this in the 1.8 branch RandomAccessBuffer consumes too much memory. Key: PDFBOX-2301 URL: https://issues.apache.org/jira/browse/PDFBOX-2301 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.6, 2.0.0 Reporter: gee Assignee: Andreas Lehmkühler Fix For: 2.0.0 Attachments: clone.diff, clone2.diff, clone3.diff RandomAccessBuffer holds uncompressed image during operation because it is what exactly pdfbox ExtractImages do. but holding uncompressed image instead of compressed one in memory consumes too much memory, not excluding many PDF XObjects that can use filter to compress itself. It would be good if pdfbox provides option that reverts to COSObject state just before the RandomAccess object created(the state that pdf XObject stream parsed and COSDictionary objects haven't created because user doesn't requested it using get() method.) It is crucial feature so that pdfbox can analyze huge pdf file(100MB). In current source, one must close COSStream unless required(and I know closed stream cannot reopened again.) Class Name | Shallow Heap | Retained Heap -- org.apache.pdfbox.cos.COSObject @ 0x5ad4940 | 24 | 8,187,264 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020 | 0 | 0 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080 | 24 |24 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 | 32 | 8,187,216 | |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00 | 8 | 8 | |- items java.util.LinkedHashMap @ 0x5b2a0f0 | 56 | 552 | |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128 | 48 | 8,186,528 | | |- class class org.apache.pdfbox.io.RandomAccessBuffer @ 0x5ad2b00
[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.
[ https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137038#comment-14137038 ] Andreas Lehmkühler commented on PDFBOX-2301: [~tboehme]I'm still thinking about the options concerning the scratch file issue. It's quite easy to switch back to use one scratch file and I'm going to do that soon at least as a workaround. But the root issue is, that the parser creates all streams at the beginning and thus is allocating a lot of memory/creating a lot of scratch files. IMO we have to refactor the whole parser to change that behaviour. And now we come to the point to implement on-demand parsing. RandomAccessBuffer consumes too much memory. Key: PDFBOX-2301 URL: https://issues.apache.org/jira/browse/PDFBOX-2301 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.6, 2.0.0 Reporter: gee Assignee: Andreas Lehmkühler Fix For: 2.0.0 Attachments: clone.diff, clone2.diff, clone3.diff RandomAccessBuffer holds uncompressed image during operation because it is what exactly pdfbox ExtractImages do. but holding uncompressed image instead of compressed one in memory consumes too much memory, not excluding many PDF XObjects that can use filter to compress itself. It would be good if pdfbox provides option that reverts to COSObject state just before the RandomAccess object created(the state that pdf XObject stream parsed and COSDictionary objects haven't created because user doesn't requested it using get() method.) It is crucial feature so that pdfbox can analyze huge pdf file(100MB). In current source, one must close COSStream unless required(and I know closed stream cannot reopened again.) Class Name | Shallow Heap | Retained Heap -- org.apache.pdfbox.cos.COSObject @ 0x5ad4940 | 24 | 8,187,264 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020 | 0 | 0 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080 | 24 |24 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 | 32 | 8,187,216 | |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00 | 8 | 8 | |- items java.util.LinkedHashMap @ 0x5b2a0f0 | 56 | 552 | |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128
[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
[ https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137047#comment-14137047 ] Andreas Lehmkühler commented on PDFBOX-2357: Have a look at org.apache.pdfbox.pdmodel.font.PDTrueTypeFontEmbedder, it should do the trick PDTrueTypeFont has no method to load font from stream - Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
[ https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-2357: --- Comment: was deleted (was: Have a look at org.apache.pdfbox.pdmodel.font.PDTrueTypeFontEmbedder, it should do the trick) PDTrueTypeFont has no method to load font from stream - Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
[ https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137078#comment-14137078 ] Cornelis Hoeflake commented on PDFBOX-2357: --- That class is not a public class. PDTrueTypeFont has no method to load font from stream - Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
[ https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137097#comment-14137097 ] Andreas Lehmkühler commented on PDFBOX-2357: I know, that's why I deleted my former comment, obviously I wasn't fast enough ;-) PDTrueTypeFont has no method to load font from stream - Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.
[ https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137140#comment-14137140 ] Timo Boehme commented on PDFBOX-2301: - [~lehmi] The NonSeqParser by design is an on-demand parser. Only because other parts of PDFBOX require data already parsed it initializes/parses all objects in the init procedure (see parseMinimalCatalog variable) as a work around. So COSObject and its subclasses should only be a stub in the beginning and if used (any method call) should trigger parsing the object by the parser (NonSequentialPDFParser.parseObjectDynamically). Thus COSDocument needs to have a reference to the parser. For the scratch file workaround I'm still in favor for a split in-memory/file usage so that only large PDF need to write to file. RandomAccessBuffer consumes too much memory. Key: PDFBOX-2301 URL: https://issues.apache.org/jira/browse/PDFBOX-2301 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.6, 2.0.0 Reporter: gee Assignee: Andreas Lehmkühler Fix For: 2.0.0 Attachments: clone.diff, clone2.diff, clone3.diff RandomAccessBuffer holds uncompressed image during operation because it is what exactly pdfbox ExtractImages do. but holding uncompressed image instead of compressed one in memory consumes too much memory, not excluding many PDF XObjects that can use filter to compress itself. It would be good if pdfbox provides option that reverts to COSObject state just before the RandomAccess object created(the state that pdf XObject stream parsed and COSDictionary objects haven't created because user doesn't requested it using get() method.) It is crucial feature so that pdfbox can analyze huge pdf file(100MB). In current source, one must close COSStream unless required(and I know closed stream cannot reopened again.) Class Name | Shallow Heap | Retained Heap -- org.apache.pdfbox.cos.COSObject @ 0x5ad4940 | 24 | 8,187,264 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020 | 0 | 0 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080 | 24 |24 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 | 32 | 8,187,216 | |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00 | 8 | 8 | |- items java.util.LinkedHashMap @ 0x5b2a0f0 | 56 | 552 | |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137490#comment-14137490 ] Tilman Hausherr commented on PDFBOX-2356: - Are you building from source? If yes, please try this: {code} private static void adjustTimeZoneNicely(GregorianCalendar cal, TimeZone tz) { cal.setTimeZone(tz); int offset = (cal.get(Calendar.ZONE_OFFSET) + cal.get(Calendar.DST_OFFSET)) / MILLIS_PER_MINUTE; cal.add(Calendar.MINUTE, -offset); } {code} If no, please post a minimal code or command line that you used to check your file (I never use preflight) and I'll test it. Error Validating PDF Archive Document - Key: PDFBOX-2356 URL: https://issues.apache.org/jira/browse/PDFBOX-2356 Project: PDFBox Issue Type: Bug Components: Preflight Affects Versions: 1.8.4, 1.8.5, 1.8.6 Reporter: Cetra Free Attachments: pdfafile.pdf When trying to validate a PDF archive file (attached to this ticket) we get the following error: {code} 7.2 - Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information {code} This is because the the Modification Date in the Dictionary is parsed differently from the XMP Metadata. The XMP Metadata is correct, but the Date from the Dictionary appends an extra 30 minutes. The following is the raw COSObject from the PDF File {code} COSString{D:20140917122850+09'30'} {code} The Long value should be *141092273* The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the Date with Long *141092453* which is 30 minutes ahead. XMP Modification Date is parsed differently and returns the correct date. This means that validation will fail for PDF Archives. My suggestion would be to refactor the parseDate function to use the Standard Java library. Here's an example class which will be compatible with the PDF Specification: {code} static class DateParser { private MapInteger, SimpleDateFormat formats = new HashMapInteger, SimpleDateFormat(); public DateParser() { String expr = ; for(String part: Arrays.asList(, MM, dd, HH, mm, ss, Z)) { expr = expr + part; formats.put(expr.length(), new SimpleDateFormat(expr)); } } public Calendar parseDate(String expr) { try { expr = expr.replace(D:, ).replace(', ).replace(Z, +); Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); Calendar calendar = Calendar.getInstance(); calendar.setTime(date); return calendar; } catch (ParseException e) { return null; } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] move documentation and examples to git
It is a I don't like it, but I can live with it but I think it might be a pain. A soft -1. Tilman Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler: Hi, Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03 geschrieben: -1, I don't like the idea to have different repository types. Hmmm, is this just a I don't like it, but I can live with it or is it a clear veto? In a case of a veto, how about starting with moving parts of the docs to a new git repo? IMO sooner or later the project will move from svn to git and that would be a good opertunity to get used to the general usage of git and of course to the special processes used here at the ASF so that we are not thrown in at the deep end after the migration. Tilman BR Andreas Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun: Hi there, in order to make it easier for people to contribute to the documentation and examples I thought about the potential benefits of moving these to a git based repository instead of svn. The main idea behind that is to allow people to contribute via github opening another channel of communication and making it easier to contribute. Proposed names are pdfbox-docs and pdfbox-examples. Take a look at https://github.com/apache/cordova-docs for an example of that. I haven’t thought about all potential implications and changes necessary yet but wanted to get a first feedback about support for that idea before putting more effort into that. WDYT? Maruan
Re: [DISCUSS] move documentation and examples to git
is that because of the examples, the docs or both? BR Maruan Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de: It is a I don't like it, but I can live with it but I think it might be a pain. A soft -1. Tilman Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler: Hi, Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03 geschrieben: -1, I don't like the idea to have different repository types. Hmmm, is this just a I don't like it, but I can live with it or is it a clear veto? In a case of a veto, how about starting with moving parts of the docs to a new git repo? IMO sooner or later the project will move from svn to git and that would be a good opertunity to get used to the general usage of git and of course to the special processes used here at the ASF so that we are not thrown in at the deep end after the migration. Tilman BR Andreas Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun: Hi there, in order to make it easier for people to contribute to the documentation and examples I thought about the potential benefits of moving these to a git based repository instead of svn. The main idea behind that is to allow people to contribute via github opening another channel of communication and making it easier to contribute. Proposed names are pdfbox-docs and pdfbox-examples. Take a look at https://github.com/apache/cordova-docs for an example of that. I haven’t thought about all potential implications and changes necessary yet but wanted to get a first feedback about support for that idea before putting more effort into that. WDYT? Maruan
[jira] [Commented] (PDFBOX-2299) Isartor tests don't work anymore
[ https://issues.apache.org/jira/browse/PDFBOX-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137574#comment-14137574 ] John Hewson commented on PDFBOX-2299: - That's ok, I know where to find it. P.S. You can attach files with More Attach Files. Isartor tests don't work anymore Key: PDFBOX-2299 URL: https://issues.apache.org/jira/browse/PDFBOX-2299 Project: PDFBox Issue Type: Bug Components: Preflight Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: John Hewson Priority: Critical Labels: Isartor, regression Sorry, I hadn't thought about this when testing the no-awt version, but the Isartor tests don't work anymore (I have them enabled for my own version since PDFBOX-2179). {code} --- Test set: org.apache.pdfbox.preflight.TestIsartor --- Tests run: 204, Failures: 35, Errors: 0, Skipped: 0, Time elapsed: 29.485 sec FAILURE! - in org.apache.pdfbox.preflight.TestIsartor validate[target\pdfs\Isartor testsuite\PDFA-1b\6.2 Graphics\6.2.3 Colour spaces\6.2.3.3 Uncalibrated colour spaces\isartor-6-2-3-3-t02-fail-h.pdf](org.apache.pdfbox.preflight.TestIsartor) Time elapsed: 0.092 sec FAILURE! java.lang.AssertionError: isartor-6-2-3-3-t02-fail-h.pdf : IllegalArgumentException raised , message=Built-in Encoding required for symbolic font at org.junit.Assert.fail(Assert.java:88) at org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175) validate[target\pdfs\Isartor testsuite\PDFA-1b\6.2 Graphics\6.2.3 Colour spaces\6.2.3.3 Uncalibrated colour spaces\isartor-6-2-3-3-t02-fail-i.pdf](org.apache.pdfbox.preflight.TestIsartor) Time elapsed: 0.006 sec FAILURE! java.lang.AssertionError: isartor-6-2-3-3-t02-fail-i.pdf : IllegalArgumentException raised , message=Built-in Encoding required for symbolic font at org.junit.Assert.fail(Assert.java:88) at org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175) validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.2 Font types\isartor-6-3-2-t01-fail-b.pdf](org.apache.pdfbox.preflight.TestIsartor) Time elapsed: 1.837 sec FAILURE! java.lang.AssertionError: isartor-6-3-2-t01-fail-b.pdf : NullPointerException raised , message=null at org.junit.Assert.fail(Assert.java:88) at org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175) validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.3 Composite fonts\6.3.3.1 General\isartor-6-3-3-1-t01-fail-a.pdf](org.apache.pdfbox.preflight.TestIsartor) Time elapsed: 0.191 sec FAILURE! java.lang.AssertionError: isartor-6-3-3-1-t01-fail-a.pdf : NullPointerException raised , message=null at org.junit.Assert.fail(Assert.java:88) at org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175) validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.3 Composite fonts\6.3.3.1 General\isartor-6-3-3-1-t01-fail-b.pdf](org.apache.pdfbox.preflight.TestIsartor) Time elapsed: 0.051 sec FAILURE! java.lang.AssertionError: isartor-6-3-3-1-t01-fail-b.pdf : NullPointerException raised , message=null at org.junit.Assert.fail(Assert.java:88) at org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175) validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.3 Composite fonts\6.3.3.2 CIDFonts\isartor-6-3-3-2-t01-fail-a.pdf](org.apache.pdfbox.preflight.TestIsartor) Time elapsed: 0.051 sec FAILURE! java.lang.AssertionError: isartor-6-3-3-2-t01-fail-a.pdf : NullPointerException raised , message=null at org.junit.Assert.fail(Assert.java:88) at org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175) validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.3 Composite fonts\6.3.3.3 CMaps\isartor-6-3-3-3-t01-fail-a.pdf](org.apache.pdfbox.preflight.TestIsartor) Time elapsed: 0.199 sec FAILURE! java.lang.AssertionError: isartor-6-3-3-3-t01-fail-a.pdf : NullPointerException raised , message=null at org.junit.Assert.fail(Assert.java:88) at org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175) validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.3 Composite fonts\6.3.3.3 CMaps\isartor-6-3-3-3-t02-fail-a.pdf](org.apache.pdfbox.preflight.TestIsartor) Time elapsed: 0.042 sec FAILURE! java.lang.AssertionError: isartor-6-3-3-3-t02-fail-a.pdf : NullPointerException raised , message=null at org.junit.Assert.fail(Assert.java:88) at org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175)
[jira] [Commented] (PDFBOX-2350) Type1 Parser hangs indefinitely
[ https://issues.apache.org/jira/browse/PDFBOX-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137583#comment-14137583 ] John Hewson commented on PDFBOX-2350: - Can you post a stack trace for the point where the application hangs? (You might have to write one down by hand using your debugger). I'd be interested to see where exactly Type1Parser is being called form in this specific case. I suspect the problem is that bad data is being passed to Type1Parser. Type1 Parser hangs indefinitely --- Key: PDFBOX-2350 URL: https://issues.apache.org/jira/browse/PDFBOX-2350 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Environment: Windows 7, JDK 1.7.0_51-b13 Reporter: Daniel Scheibe When rendering the first page of my pdf document the Type1Parser (org.apache.fontbox.type1.Type1Parser) hangs in a loop in {{parseBinary(byte[] bytes) throws IOException}} and kills our rendering pipeline. Please find the loop that hangs below: // find /Private dict while (!lexer.peekToken().getText().equals(Private)) { lexer.nextToken(); } There is no token named Private ever in the list of returned tokens (they're empty all the time). Furthermore going deeper into the source code it seems the class reading the tokens (Type1Lexer) does never finally advance the buffer position and always returns an empty name token in the readToken(Token prevToken) method. Looking at the decrypted buffer i cannot get something useful out of it based on my current understanding. Unfortunately i cannot provide the pdf in question as it contains confidental data. Acrobat Reader XI Version 11.0.08 renders the document just fine. In addition it seems the pdf was encrypted (40-Bit RC4) with an empty password and says it's pdf version 1.5. Does this provide enough information or can i do anything else to help nailing this one down? I guess this might be a pdf document structure/feature that is not yet supported completely but at least pdfbox should throw an exception instead of failing silently... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] move documentation and examples to git
Am 17.09.2014 um 19:01 schrieb Maruan Sahyoun: is that because of the examples, the docs or both? The examples could be tricky as they depend on the source code in the svn repo. BR Maruan BR Andreas Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de: It is a I don't like it, but I can live with it but I think it might be a pain. A soft -1. Tilman Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler: Hi, Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03 geschrieben: -1, I don't like the idea to have different repository types. Hmmm, is this just a I don't like it, but I can live with it or is it a clear veto? In a case of a veto, how about starting with moving parts of the docs to a new git repo? IMO sooner or later the project will move from svn to git and that would be a good opertunity to get used to the general usage of git and of course to the special processes used here at the ASF so that we are not thrown in at the deep end after the migration. Tilman BR Andreas Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun: Hi there, in order to make it easier for people to contribute to the documentation and examples I thought about the potential benefits of moving these to a git based repository instead of svn. The main idea behind that is to allow people to contribute via github opening another channel of communication and making it easier to contribute. Proposed names are pdfbox-docs and pdfbox-examples. Take a look at https://github.com/apache/cordova-docs for an example of that. I haven’t thought about all potential implications and changes necessary yet but wanted to get a first feedback about support for that idea before putting more effort into that. WDYT? Maruan
Re: [DISCUSS] move documentation and examples to git
Hi Maruan, The examples only. With the docs I assume you mean the website. I've never touched it (although I might in the future), it isn't part of the project, so I don't mind. Tilman Am 17.09.2014 um 19:01 schrieb Maruan Sahyoun: is that because of the examples, the docs or both? BR Maruan Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de: It is a I don't like it, but I can live with it but I think it might be a pain. A soft -1. Tilman Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler: Hi, Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03 geschrieben: -1, I don't like the idea to have different repository types. Hmmm, is this just a I don't like it, but I can live with it or is it a clear veto? In a case of a veto, how about starting with moving parts of the docs to a new git repo? IMO sooner or later the project will move from svn to git and that would be a good opertunity to get used to the general usage of git and of course to the special processes used here at the ASF so that we are not thrown in at the deep end after the migration. Tilman BR Andreas Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun: Hi there, in order to make it easier for people to contribute to the documentation and examples I thought about the potential benefits of moving these to a git based repository instead of svn. The main idea behind that is to allow people to contribute via github opening another channel of communication and making it easier to contribute. Proposed names are pdfbox-docs and pdfbox-examples. Take a look at https://github.com/apache/cordova-docs for an example of that. I haven’t thought about all potential implications and changes necessary yet but wanted to get a first feedback about support for that idea before putting more effort into that. WDYT? Maruan
[jira] [Commented] (PDFBOX-2355) newDocuments is private in Splitter
[ https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137641#comment-14137641 ] ASF subversion and git services commented on PDFBOX-2355: - Commit 1625717 from [~jahewson] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1625717 ] PDFBOX-2355: Refactor Splitter protected members newDocuments is private in Splitter --- Key: PDFBOX-2355 URL: https://issues.apache.org/jira/browse/PDFBOX-2355 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.6 Environment: Ubuntu 14.04, Java 8_20 Reporter: G. Ralph Kuntz Priority: Critical Labels: pdfbox The method `createNewDocument` in `Splitter` is protected, so it can be overridden, but one of the things it needs to do with the new document is add it to the `newDocuments` list, which is private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2355) newDocuments is private in Splitter
[ https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137649#comment-14137649 ] John Hewson commented on PDFBOX-2355: - The Splitter class has not had any attention for a while. Clearly you're the first person to try to use the subclassing functionality it provides, and it isn't actually possible. I've refactored the internals of Splitter and reduced its protected API to to just 5 methods. The createNewDocument() method can now simply call getSourceDocument() to access the source document, and can now return a PDDocument instance which will be added to newDocuments (now destinationDocuments) internally. As nobody else has tried using the API in this manner, please let me know how well it works for you - we can incorporate any useful changes needed. newDocuments is private in Splitter --- Key: PDFBOX-2355 URL: https://issues.apache.org/jira/browse/PDFBOX-2355 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.6 Environment: Ubuntu 14.04, Java 8_20 Reporter: G. Ralph Kuntz Priority: Critical Labels: pdfbox Fix For: 2.0.0 The method `createNewDocument` in `Splitter` is protected, so it can be overridden, but one of the things it needs to do with the new document is add it to the `newDocuments` list, which is private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PDFBOX-2355) newDocuments is private in Splitter
[ https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson reassigned PDFBOX-2355: --- Assignee: John Hewson newDocuments is private in Splitter --- Key: PDFBOX-2355 URL: https://issues.apache.org/jira/browse/PDFBOX-2355 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.6 Environment: Ubuntu 14.04, Java 8_20 Reporter: G. Ralph Kuntz Assignee: John Hewson Priority: Critical Labels: pdfbox Fix For: 2.0.0 The method `createNewDocument` in `Splitter` is protected, so it can be overridden, but one of the things it needs to do with the new document is add it to the `newDocuments` list, which is private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PDFBOX-2355) newDocuments is private in Splitter
[ https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson resolved PDFBOX-2355. - Resolution: Fixed Fix Version/s: 2.0.0 newDocuments is private in Splitter --- Key: PDFBOX-2355 URL: https://issues.apache.org/jira/browse/PDFBOX-2355 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.6 Environment: Ubuntu 14.04, Java 8_20 Reporter: G. Ralph Kuntz Assignee: John Hewson Priority: Critical Labels: pdfbox Fix For: 2.0.0 The method `createNewDocument` in `Splitter` is protected, so it can be overridden, but one of the things it needs to do with the new document is add it to the `newDocuments` list, which is private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2355) newDocuments is private in Splitter
[ https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-2355: Priority: Major (was: Critical) newDocuments is private in Splitter --- Key: PDFBOX-2355 URL: https://issues.apache.org/jira/browse/PDFBOX-2355 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.6 Environment: Ubuntu 14.04, Java 8_20 Reporter: G. Ralph Kuntz Assignee: John Hewson Labels: pdfbox Fix For: 2.0.0 The method `createNewDocument` in `Splitter` is protected, so it can be overridden, but one of the things it needs to do with the new document is add it to the `newDocuments` list, which is private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2355) newDocuments is private in Splitter
[ https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137658#comment-14137658 ] G. Ralph Kuntz commented on PDFBOX-2355: Clearly you're the first person to try to use the subclassing functionality it provides I kind of figured :-) newDocuments is private in Splitter --- Key: PDFBOX-2355 URL: https://issues.apache.org/jira/browse/PDFBOX-2355 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.6 Environment: Ubuntu 14.04, Java 8_20 Reporter: G. Ralph Kuntz Assignee: John Hewson Labels: pdfbox Fix For: 2.0.0 The method `createNewDocument` in `Splitter` is protected, so it can be overridden, but one of the things it needs to do with the new document is add it to the `newDocuments` list, which is private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PDFBOX-2355) newDocuments is private in Splitter
[ https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137658#comment-14137658 ] G. Ralph Kuntz edited comment on PDFBOX-2355 at 9/17/14 6:04 PM: - Clearly you're the first person to try to use the subclassing functionality it provides I kind of figured :-) I used reflection to access the private field. I'll rewrite it properly once the new code is released. was (Author: grkuntzmd): Clearly you're the first person to try to use the subclassing functionality it provides I kind of figured :-) newDocuments is private in Splitter --- Key: PDFBOX-2355 URL: https://issues.apache.org/jira/browse/PDFBOX-2355 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.6 Environment: Ubuntu 14.04, Java 8_20 Reporter: G. Ralph Kuntz Assignee: John Hewson Labels: pdfbox Fix For: 2.0.0 The method `createNewDocument` in `Splitter` is protected, so it can be overridden, but one of the things it needs to do with the new document is add it to the `newDocuments` list, which is private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Jenkins build became unstable: PDFBox-trunk #1279
See https://builds.apache.org/job/PDFBox-trunk/1279/changes
Jenkins build became unstable: PDFBox-trunk » Apache PDFBox #1279
See https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox/1279/changes
[jira] [Assigned] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
[ https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson reassigned PDFBOX-2358: --- Assignee: John Hewson ExternalFonts uses classloader of class in font-box --- Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PDFBOX-2340) Overhaul PDFBox Documentation
[ https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137665#comment-14137665 ] Tilman Hausherr edited comment on PDFBOX-2340 at 9/17/14 6:08 PM: -- While the mockups looks very nice, I'm more a content guy who doesn't care about looks (this only applies to software :-) ). There are two things that are missing in the documentation, one is sample code for rendering, the other is an improved text for people opening issues. Here's a text that could be merged with the existing text at https://pdfbox.apache.org/support.html We want to help you. We don't respond by clicking on boilerplate texts. Solving your issues is what makes PDFBox better and better! Do's: - attach the PDF that makes trouble by using More, Attach files. - If your file is too large, upload it to a sharehoster, or use the PDFSplit application to isolate the troublesome page - Mention the PDFBox version you are using. - Attach the shortest possible code that reproduces the problem. Insert java code between \{code\}...\{code\}. Or try to reproduce the problem with the command line applications. - mention what you were doing, what was the expected behaviour, and what happened instead - Provide a stack trace of an exception if there is one - Try using the non-sequential parser (loadNonSeq() instead of load(), and -nonSeq with the command line applications) - Search JIRA if your problem has been mentioned before. - Be patient: all the people here are unpaid volunteers who work for you in their free time Dont's: - upload files to a hoster that requires registration to read the file. - create an issue in JIRA and then go on vacation so you won't repond to our questions / suggestions. - ask how to questions. Ask such questions on the mailing lists, on stackoverflow.com, and look at the sample and the test code in the sources. - attach PDF files with confidential and/or personal data (name, DoB, bank data, health data, SSN) without getting permission from the client and/or the people mentioned on the PDF - create issues about obsolete PDFBox versions We can sometimes solve problems without having the PDF, but it is difficult. was (Author: tilman): While it looks very nice, I'm more a content guy who doesn't care about looks (this only applies to software :-) ). There are two things that are missing in the documentation, one is sample code for rendering, the other is an improved text for people opening issues. Here's a text that could be merged with the existing text at https://pdfbox.apache.org/support.html We want to help you. We don't respond by clicking on boilerplate texts. Solving your issues is what makes PDFBox better and better! Do's: - attach the PDF that makes trouble by using More, Attach files. - If your file is too large, upload it to a sharehoster, or use the PDFSplit application to isolate the troublesome page - Mention the PDFBox version you are using. - Attach the shortest possible code that reproduces the problem. Insert java code between \{code\}...\{code\}. Or try to reproduce the problem with the command line applications. - mention what you were doing, what was the expected behaviour, and what happened instead - Provide a stack trace of an exception if there is one - Try using the non-sequential parser (loadNonSeq() instead of load(), and -nonSeq with the command line applications) - Search JIRA if your problem has been mentioned before. - Be patient: all the people here are unpaid volunteers who work for you in their free time Dont's: - upload files to a hoster that requires registration to read the file. - create an issue in JIRA and then go on vacation so you won't repond to our questions / suggestions. - ask how to questions. Ask such questions on the mailing lists, on stackoverflow.com, and look at the sample and the test code in the sources. - attach PDF files with confidential and/or personal data (name, DoB, bank data, health data, SSN) without getting permission from the client and/or the people mentioned on the PDF - create issues about obsolete PDFBox versions We can sometimes solve problems without having the PDF, but it is difficult. Overhaul PDFBox Documentation - Key: PDFBOX-2340 URL: https://issues.apache.org/jira/browse/PDFBOX-2340 Project: PDFBox Issue Type: Improvement Components: Documentation Reporter: Maruan Sahyoun Attachments: Mockup-20140912.png, Mockup_Documentation.png In oder to make it easier for users of PDFBox to work with the library there shall be an enhanced documentation consisting of an introduction, API references and more well documented examples and code snippets (Cookbook). In order to make it easier to contribute the Cookbook shall be build automatically from
[jira] [Commented] (PDFBOX-2340) Overhaul PDFBox Documentation
[ https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137665#comment-14137665 ] Tilman Hausherr commented on PDFBOX-2340: - While it looks very nice, I'm more a content guy who doesn't care about looks (this only applies to software :-) ). There are two things that are missing in the documentation, one is sample code for rendering, the other is an improved text for people opening issues. Here's a text that could be merged with the existing text at https://pdfbox.apache.org/support.html We want to help you. We don't respond by clicking on boilerplate texts. Solving your issues is what makes PDFBox better and better! Do's: - attach the PDF that makes trouble by using More, Attach files. - If your file is too large, upload it to a sharehoster, or use the PDFSplit application to isolate the troublesome page - Mention the PDFBox version you are using. - Attach the shortest possible code that reproduces the problem. Insert java code between \{code\}...\{code\}. Or try to reproduce the problem with the command line applications. - mention what you were doing, what was the expected behaviour, and what happened instead - Provide a stack trace of an exception if there is one - Try using the non-sequential parser (loadNonSeq() instead of load(), and -nonSeq with the command line applications) - Search JIRA if your problem has been mentioned before. - Be patient: all the people here are unpaid volunteers who work for you in their free time Dont's: - upload files to a hoster that requires registration to read the file. - create an issue in JIRA and then go on vacation so you won't repond to our questions / suggestions. - ask how to questions. Ask such questions on the mailing lists, on stackoverflow.com, and look at the sample and the test code in the sources. - attach PDF files with confidential and/or personal data (name, DoB, bank data, health data, SSN) without getting permission from the client and/or the people mentioned on the PDF - create issues about obsolete PDFBox versions We can sometimes solve problems without having the PDF, but it is difficult. Overhaul PDFBox Documentation - Key: PDFBOX-2340 URL: https://issues.apache.org/jira/browse/PDFBOX-2340 Project: PDFBox Issue Type: Improvement Components: Documentation Reporter: Maruan Sahyoun Attachments: Mockup-20140912.png, Mockup_Documentation.png In oder to make it easier for users of PDFBox to work with the library there shall be an enhanced documentation consisting of an introduction, API references and more well documented examples and code snippets (Cookbook). In order to make it easier to contribute the Cookbook shall be build automatically from the examples/snippet ‚repository‘. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
[ https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137721#comment-14137721 ] John Hewson edited comment on PDFBOX-2358 at 9/17/14 6:40 PM: -- The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete accident. PDFBox has its own ResourceLoader class which should have been used. The way that CMapParser uses the FontBox ResourceLoader is more troubling. As you say, passing resource paths from PDFBox to FontBox is not OSGI friendly. The solution would seem to be to move the cmap resource files to FontBox, because they are not PDF-specific. was (Author: jahewson): The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete accident. PDFBox has its own ResourceLoader class which should have been used. The way that CMapParser uses the FontBox ResourceLoader is more troubling, passing resource paths from PDFBox to FontBox is not OSGI friendly. The solution would seem to be to move the cmap resource files to FontBox, because they are not PDF-specific. ExternalFonts uses classloader of class in font-box --- Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
[ https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137721#comment-14137721 ] John Hewson commented on PDFBOX-2358: - The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete accident. PDFBox has its own ResourceLoader class which should have been used. The way that CMapParser uses the FontBox ResourceLoader is more troubling, passing resource paths from PDFBox to FontBox is not OSGI friendly. The solution would seem to be to move the cmap resource files to FontBox, because they are not PDF-specific. ExternalFonts uses classloader of class in font-box --- Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
[ https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137721#comment-14137721 ] John Hewson edited comment on PDFBOX-2358 at 9/17/14 6:40 PM: -- The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete accident. PDFBox has its own ResourceLoader class which should have been used. The way that CMapParser uses the FontBox ResourceLoader is more troubling. As you say, passing resource paths from PDFBox to FontBox is not OSGI friendly. The solution would seem to be to move the cmap resource files from PDFBox to FontBox, because they are not PDF-specific. was (Author: jahewson): The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete accident. PDFBox has its own ResourceLoader class which should have been used. The way that CMapParser uses the FontBox ResourceLoader is more troubling. As you say, passing resource paths from PDFBox to FontBox is not OSGI friendly. The solution would seem to be to move the cmap resource files to FontBox, because they are not PDF-specific. ExternalFonts uses classloader of class in font-box --- Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
[ https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137721#comment-14137721 ] John Hewson edited comment on PDFBOX-2358 at 9/17/14 6:41 PM: -- The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete accident. PDFBox has its own ResourceLoader class which should have been used. The way that CMapParser uses the FontBox ResourceLoader is more troubling. As you say, passing resource paths from PDFBox to FontBox is not OSGI friendly. The solution would seem to be to move the cmap resource files from PDFBox to FontBox, because they are not PDF-specific, they are more properly part of Adobe's CIDFont system. was (Author: jahewson): The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete accident. PDFBox has its own ResourceLoader class which should have been used. The way that CMapParser uses the FontBox ResourceLoader is more troubling. As you say, passing resource paths from PDFBox to FontBox is not OSGI friendly. The solution would seem to be to move the cmap resource files from PDFBox to FontBox, because they are not PDF-specific. ExternalFonts uses classloader of class in font-box --- Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2306) Error reading stream, expected='endstream' actual='endobj'
[ https://issues.apache.org/jira/browse/PDFBOX-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137724#comment-14137724 ] ASF subversion and git services commented on PDFBOX-2306: - Commit 1625736 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1625736 ] PDFBOX-2306: be lenient, allow stream to end with endobj Error reading stream, expected='endstream' actual='endobj' -- Key: PDFBOX-2306 URL: https://issues.apache.org/jira/browse/PDFBOX-2306 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 2.0.0 I get this exception with the file of PDFBOX-269: {code} java.io.IOException: Error reading stream, expected='endstream' actual='endobj' at offset 183468 at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1578) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1249) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1176) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1152) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:487) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:755) at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1155) at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1138) {code} The cause is that a stream ends with endobj instead of endstream. This is accepted in the non sequential parser in readUntilEndStream() but later it isn't. It is a problem that was fixed in the old parser many years ago. My fix is for the sequential parser. I also changed a misleading error message nearby. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2306) Error reading stream, expected='endstream' actual='endobj'
[ https://issues.apache.org/jira/browse/PDFBOX-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2306: Affects Version/s: 1.8.8 1.8.7 1.8.6 Error reading stream, expected='endstream' actual='endobj' -- Key: PDFBOX-2306 URL: https://issues.apache.org/jira/browse/PDFBOX-2306 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 1.8.8, 2.0.0 I get this exception with the file of PDFBOX-269: {code} java.io.IOException: Error reading stream, expected='endstream' actual='endobj' at offset 183468 at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1578) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1249) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1176) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1152) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:487) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:755) at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1155) at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1138) {code} The cause is that a stream ends with endobj instead of endstream. This is accepted in the non sequential parser in readUntilEndStream() but later it isn't. It is a problem that was fixed in the old parser many years ago. My fix is for the sequential parser. I also changed a misleading error message nearby. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PDFBOX-2306) Error reading stream, expected='endstream' actual='endobj'
[ https://issues.apache.org/jira/browse/PDFBOX-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-2306. - Resolution: Fixed Error reading stream, expected='endstream' actual='endobj' -- Key: PDFBOX-2306 URL: https://issues.apache.org/jira/browse/PDFBOX-2306 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 1.8.8, 2.0.0 I get this exception with the file of PDFBOX-269: {code} java.io.IOException: Error reading stream, expected='endstream' actual='endobj' at offset 183468 at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1578) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1249) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1176) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1152) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:487) at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:755) at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1155) at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1138) {code} The cause is that a stream ends with endobj instead of endstream. This is accepted in the non sequential parser in readUntilEndStream() but later it isn't. It is a problem that was fixed in the old parser many years ago. My fix is for the sequential parser. I also changed a misleading error message nearby. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2296) Wrong stream length used for truetype font
[ https://issues.apache.org/jira/browse/PDFBOX-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2296: Affects Version/s: 1.8.8 1.8.7 1.8.6 Wrong stream length used for truetype font -- Key: PDFBOX-2296 URL: https://issues.apache.org/jira/browse/PDFBOX-2296 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 1.8.8, 2.0.0 The file of PDFBOX-2048 has a wrong encoded font length, it is 4412 in the PDF but it is really about 27350. This wrong length is used to read the encoded font stream and this results in further trouble (EOF). The problem is that the wrong length is passed to createFilteredStream() instead of just calling it without parameters. In cosStream.doDecode() unFilteredStream = filteredStream (there is a FIXME there!!!), and in doDecode(COSName filterName, int filterIndex) unFilteredStream.getLength() is used, which returns the expectedLength. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2296) Wrong stream length used for truetype font
[ https://issues.apache.org/jira/browse/PDFBOX-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137734#comment-14137734 ] ASF subversion and git services commented on PDFBOX-2296: - Commit 1625743 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1625743 ] PDFBOX-2296: don't call createFilteredStream() with an expected length if we know that length is wrong Wrong stream length used for truetype font -- Key: PDFBOX-2296 URL: https://issues.apache.org/jira/browse/PDFBOX-2296 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 1.8.8, 2.0.0 The file of PDFBOX-2048 has a wrong encoded font length, it is 4412 in the PDF but it is really about 27350. This wrong length is used to read the encoded font stream and this results in further trouble (EOF). The problem is that the wrong length is passed to createFilteredStream() instead of just calling it without parameters. In cosStream.doDecode() unFilteredStream = filteredStream (there is a FIXME there!!!), and in doDecode(COSName filterName, int filterIndex) unFilteredStream.getLength() is used, which returns the expectedLength. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2296) Wrong stream length used for truetype font
[ https://issues.apache.org/jira/browse/PDFBOX-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2296: Fix Version/s: 2.0.0 1.8.8 Wrong stream length used for truetype font -- Key: PDFBOX-2296 URL: https://issues.apache.org/jira/browse/PDFBOX-2296 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: Tilman Hausherr Assignee: Tilman Hausherr Fix For: 1.8.8, 2.0.0 The file of PDFBOX-2048 has a wrong encoded font length, it is 4412 in the PDF but it is really about 27350. This wrong length is used to read the encoded font stream and this results in further trouble (EOF). The problem is that the wrong length is passed to createFilteredStream() instead of just calling it without parameters. In cosStream.doDecode() unFilteredStream = filteredStream (there is a FIXME there!!!), and in doDecode(COSName filterName, int filterIndex) unFilteredStream.getLength() is used, which returns the expectedLength. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2296) Wrong stream length used for truetype font
[ https://issues.apache.org/jira/browse/PDFBOX-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2296: Assignee: (was: Tilman Hausherr) Wrong stream length used for truetype font -- Key: PDFBOX-2296 URL: https://issues.apache.org/jira/browse/PDFBOX-2296 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: Tilman Hausherr Fix For: 1.8.8, 2.0.0 The file of PDFBOX-2048 has a wrong encoded font length, it is 4412 in the PDF but it is really about 27350. This wrong length is used to read the encoded font stream and this results in further trouble (EOF). The problem is that the wrong length is passed to createFilteredStream() instead of just calling it without parameters. In cosStream.doDecode() unFilteredStream = filteredStream (there is a FIXME there!!!), and in doDecode(COSName filterName, int filterIndex) unFilteredStream.getLength() is used, which returns the expectedLength. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2320) IOException: Could not read embedded TTF for font TimesNewRoman
[ https://issues.apache.org/jira/browse/PDFBOX-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137790#comment-14137790 ] ASF subversion and git services commented on PDFBOX-2320: - Commit 1625776 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1625776 ] PDFBOX-2320: use readUntilEndStream from BaseParser, remove the method from NonSequentialParser; better log output IOException: Could not read embedded TTF for font TimesNewRoman --- Key: PDFBOX-2320 URL: https://issues.apache.org/jira/browse/PDFBOX-2320 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Attachments: Stream-1410081173856.txt http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/TEST_SetCharSpacing_Error.pdf?revision=682412view=copathrev=793348 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage -nonSeq TEST_SetCharSpacing_Error.pdf {code} Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser validateStreamLength SEVERE: The end of the stream doesn't point to the correct offset, using workaround to read the stream Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser validateStreamLength SEVERE: The end of the stream doesn't point to the correct offset, using workaround to read the stream Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Exception in thread main java.io.IOException: Could not read embedded TTF for font TimesNewRoman at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:116) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:73) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:171) at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:543) at org.apache.pdfbox.util.operator.text.SetTextFont.process(SetTextFont.java:48) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:510) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:275) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:240) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:194) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:176) at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:228) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:160) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:109) at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:265) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid block type at
[jira] [Updated] (PDFBOX-2320) IOException: Could not read embedded TTF for font TimesNewRoman
[ https://issues.apache.org/jira/browse/PDFBOX-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2320: Affects Version/s: 1.8.8 1.8.7 1.8.6 IOException: Could not read embedded TTF for font TimesNewRoman --- Key: PDFBOX-2320 URL: https://issues.apache.org/jira/browse/PDFBOX-2320 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Attachments: Stream-1410081173856.txt http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/TEST_SetCharSpacing_Error.pdf?revision=682412view=copathrev=793348 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage -nonSeq TEST_SetCharSpacing_Error.pdf {code} Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser validateStreamLength SEVERE: The end of the stream doesn't point to the correct offset, using workaround to read the stream Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser validateStreamLength SEVERE: The end of the stream doesn't point to the correct offset, using workaround to read the stream Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Exception in thread main java.io.IOException: Could not read embedded TTF for font TimesNewRoman at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:116) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:73) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:171) at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:543) at org.apache.pdfbox.util.operator.text.SetTextFont.process(SetTextFont.java:48) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:510) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:275) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:240) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:194) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:176) at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:228) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:160) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:109) at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:265) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid block type at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:365) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:286) at
[jira] [Commented] (PDFBOX-2320) IOException: Could not read embedded TTF for font TimesNewRoman
[ https://issues.apache.org/jira/browse/PDFBOX-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137793#comment-14137793 ] ASF subversion and git services commented on PDFBOX-2320: - Commit 1625777 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1625777 ] PDFBOX-2320: set readUntilEndStream from BaseParser to protected to allow access from nonseq parser IOException: Could not read embedded TTF for font TimesNewRoman --- Key: PDFBOX-2320 URL: https://issues.apache.org/jira/browse/PDFBOX-2320 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Attachments: Stream-1410081173856.txt http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/TEST_SetCharSpacing_Error.pdf?revision=682412view=copathrev=793348 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage -nonSeq TEST_SetCharSpacing_Error.pdf {code} Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser validateStreamLength SEVERE: The end of the stream doesn't point to the correct offset, using workaround to read the stream Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser validateStreamLength SEVERE: The end of the stream doesn't point to the correct offset, using workaround to read the stream Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Exception in thread main java.io.IOException: Could not read embedded TTF for font TimesNewRoman at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:116) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:73) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:171) at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:543) at org.apache.pdfbox.util.operator.text.SetTextFont.process(SetTextFont.java:48) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:510) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:275) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:240) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:194) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:176) at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:228) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:160) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:109) at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:265) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid block type at
[jira] [Resolved] (PDFBOX-2320) IOException: Could not read embedded TTF for font TimesNewRoman
[ https://issues.apache.org/jira/browse/PDFBOX-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-2320. - Resolution: Fixed Fix Version/s: 2.0.0 1.8.8 IOException: Could not read embedded TTF for font TimesNewRoman --- Key: PDFBOX-2320 URL: https://issues.apache.org/jira/browse/PDFBOX-2320 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Fix For: 1.8.8, 2.0.0 Attachments: Stream-1410081173856.txt http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/TEST_SetCharSpacing_Error.pdf?revision=682412view=copathrev=793348 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage -nonSeq TEST_SetCharSpacing_Error.pdf {code} Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser validateStreamLength SEVERE: The end of the stream doesn't point to the correct offset, using workaround to read the stream Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser validateStreamLength SEVERE: The end of the stream doesn't point to the correct offset, using workaround to read the stream Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Exception in thread main java.io.IOException: Could not read embedded TTF for font TimesNewRoman at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:116) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:73) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:171) at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:543) at org.apache.pdfbox.util.operator.text.SetTextFont.process(SetTextFont.java:48) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:510) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:275) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:240) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:194) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:176) at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:228) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:160) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:109) at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:265) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid block type at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:365) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:286) at
[jira] [Updated] (PDFBOX-2332) Error reading stream, expected='endstream' actual='endstream8' at offset 1993
[ https://issues.apache.org/jira/browse/PDFBOX-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2332: Affects Version/s: 1.8.8 1.8.7 1.8.6 Error reading stream, expected='endstream' actual='endstream8' at offset 1993 - Key: PDFBOX-2332 URL: https://issues.apache.org/jira/browse/PDFBOX-2332 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Fix For: 2.0.0 PDF from PDFBOX-195 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage -nonSeq test_bad.pdf Exception in thread main java.io.IOException: Error reading stream, expected='endstream' actual='endstream8' at offset 1993 at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1576) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PDFBOX-2332) Error reading stream, expected='endstream' actual='endstream8' at offset 1993
[ https://issues.apache.org/jira/browse/PDFBOX-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-2332. - Resolution: Fixed Fix Version/s: 1.8.8 Error reading stream, expected='endstream' actual='endstream8' at offset 1993 - Key: PDFBOX-2332 URL: https://issues.apache.org/jira/browse/PDFBOX-2332 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Fix For: 1.8.8, 2.0.0 PDF from PDFBOX-195 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage -nonSeq test_bad.pdf Exception in thread main java.io.IOException: Error reading stream, expected='endstream' actual='endstream8' at offset 1993 at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1576) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2332) Error reading stream, expected='endstream' actual='endstream8' at offset 1993
[ https://issues.apache.org/jira/browse/PDFBOX-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137798#comment-14137798 ] ASF subversion and git services commented on PDFBOX-2332: - Commit 1625778 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1625778 ] PDFBOX-2332: allow missing space characters after endstream in non sequential parser, e.g. entstream8 0 obj Error reading stream, expected='endstream' actual='endstream8' at offset 1993 - Key: PDFBOX-2332 URL: https://issues.apache.org/jira/browse/PDFBOX-2332 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Fix For: 1.8.8, 2.0.0 PDF from PDFBOX-195 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage -nonSeq test_bad.pdf Exception in thread main java.io.IOException: Error reading stream, expected='endstream' actual='endstream8' at offset 1993 at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1576) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly
[ https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137838#comment-14137838 ] ASF subversion and git services commented on PDFBOX-2342: - Commit 1625779 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1625779 ] PDFBOX-2342: decrypt COSArray too, not just COSString WriteDecodedDoc cant decrypt pdf form correctly --- Key: PDFBOX-2342 URL: https://issues.apache.org/jira/browse/PDFBOX-2342 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 2.0.0 Reporter: simon steiner Attachments: test.pdf java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar WriteDecodedDoc -nonSeq test.pdf country selection is wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly
[ https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137841#comment-14137841 ] ASF subversion and git services commented on PDFBOX-2342: - Commit 1625780 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1625780 ] PDFBOX-2342: allow public access to decryptArray WriteDecodedDoc cant decrypt pdf form correctly --- Key: PDFBOX-2342 URL: https://issues.apache.org/jira/browse/PDFBOX-2342 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 2.0.0 Reporter: simon steiner Attachments: test.pdf java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar WriteDecodedDoc -nonSeq test.pdf country selection is wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly
[ https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2342: Affects Version/s: 1.8.8 1.8.7 1.8.6 WriteDecodedDoc cant decrypt pdf form correctly --- Key: PDFBOX-2342 URL: https://issues.apache.org/jira/browse/PDFBOX-2342 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: simon steiner Attachments: test.pdf java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar WriteDecodedDoc -nonSeq test.pdf country selection is wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly
[ https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reassigned PDFBOX-2342: --- Assignee: Tilman Hausherr WriteDecodedDoc cant decrypt pdf form correctly --- Key: PDFBOX-2342 URL: https://issues.apache.org/jira/browse/PDFBOX-2342 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Attachments: test.pdf java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar WriteDecodedDoc -nonSeq test.pdf country selection is wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly
[ https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-2342. - Resolution: Fixed Fix Version/s: 2.0.0 1.8.8 WriteDecodedDoc cant decrypt pdf form correctly --- Key: PDFBOX-2342 URL: https://issues.apache.org/jira/browse/PDFBOX-2342 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Fix For: 1.8.8, 2.0.0 Attachments: test.pdf java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar WriteDecodedDoc -nonSeq test.pdf country selection is wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PDFBOX-2350) Type1 Parser hangs indefinitely
[ https://issues.apache.org/jira/browse/PDFBOX-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135708#comment-14135708 ] Tilman Hausherr edited comment on PDFBOX-2350 at 9/17/14 7:47 PM: -- Please try also {code} PDDocument.loadNonSeq(new File(pdfFilename), ); {code} that does the decryption if needed. also, the correct way to decrypt with the old parser is {code} if( document.isEncrypted() ) { try { StandardDecryptionMaterial sdm = new StandardDecryptionMaterial(); document.openProtection(sdm); } catch( InvalidPasswordException e ) { System.err.println( Error: The document is encrypted. ); } } {code} I'm not saying that this will solve your problems but it is worth a try. If it still doesn't work, please change the code so that parseBinary() saves the decrypted byte array into a file (below line 448), and attach that file here. {code} FileOutputStream fos = new FileOutputStream(new File(font- + System.currentTimeMillis() + .txt)); fos.write(decrypted); fos.close(); {code} Unless that font is also confidential :-( Other things to try if you are using windows: download and use qpdf to uncompress the file and see if there are any error messages. qpdf --stream-data=uncompress file.pdf fileU.pdf was (Author: tilman): Please try also {code} PDDocument.loadNonSeq(new File(pdfFilename), ); {code} that does the decryption if needed. also, the correct way to decrypt with the old parser is {code} if( document.isEncrypted() ) { try { StandardDecryptionMaterial sdm = new StandardDecryptionMaterial(); document.openProtection(sdm); } catch( InvalidPasswordException e ) { System.err.println( Error: The document is encrypted. ); } } {code} I'm not saying that this will solve your problems but it is worth a try. If it still doesn't work, please save the decrypt byte array (in the ParseBinary nethod) in a file and post it here. Type1 Parser hangs indefinitely --- Key: PDFBOX-2350 URL: https://issues.apache.org/jira/browse/PDFBOX-2350 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Environment: Windows 7, JDK 1.7.0_51-b13 Reporter: Daniel Scheibe When rendering the first page of my pdf document the Type1Parser (org.apache.fontbox.type1.Type1Parser) hangs in a loop in {{parseBinary(byte[] bytes) throws IOException}} and kills our rendering pipeline. Please find the loop that hangs below: // find /Private dict while (!lexer.peekToken().getText().equals(Private)) { lexer.nextToken(); } There is no token named Private ever in the list of returned tokens (they're empty all the time). Furthermore going deeper into the source code it seems the class reading the tokens (Type1Lexer) does never finally advance the buffer position and always returns an empty name token in the readToken(Token prevToken) method. Looking at the decrypted buffer i cannot get something useful out of it based on my current understanding. Unfortunately i cannot provide the pdf in question as it contains confidental data. Acrobat Reader XI Version 11.0.08 renders the document just fine. In addition it seems the pdf was encrypted (40-Bit RC4) with an empty password and says it's pdf version 1.5. Does this provide enough information or can i do anything else to help nailing this one down? I guess this might be a pdf document structure/feature that is not yet supported completely but at least pdfbox should throw an exception instead of failing silently... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: PDFBox 1.8.x (JDK7) #82
See https://builds.apache.org/job/PDFBox%201.8.x%20(JDK7)/82/changes Changes: [tilman] PDFBOX-2342: allow public access to decryptArray [tilman] PDFBOX-2342: decrypt COSArray too, not just COSString [tilman] PDFBOX-2332: allow missing space characters after endstream in non sequential parser, e.g. entstream8 0 obj [tilman] PDFBOX-2320: set readUntilEndStream from BaseParser to protected to allow access from nonseq parser [tilman] PDFBOX-2320: use readUntilEndStream from BaseParser, remove the method from NonSequentialParser; better log output -- [...truncated 295 lines...] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.03 sec Running org.apache.xmpbox.type.TestResourceEventType Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.083 sec Running org.apache.xmpbox.type.TestSimpleMetadataProperties Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.032 sec Running org.apache.xmpbox.type.TestVersionType Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.017 sec Running org.apache.xmpbox.type.AttributeTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec Running org.apache.xmpbox.schema.PDFAIdentificationTest Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.014 sec Running org.apache.xmpbox.schema.XMPBasicTest Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.069 sec Running org.apache.xmpbox.schema.PDFAIdentificationOthersTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 sec Running org.apache.xmpbox.schema.DublinCoreTest Tests run: 60, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.198 sec Running org.apache.xmpbox.schema.XMPMediaManagementTest Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.075 sec Running org.apache.xmpbox.schema.XMPSchemaTest Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.028 sec Running org.apache.xmpbox.schema.AdobePDFErrorsTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 sec Running org.apache.xmpbox.schema.BasicJobTicketSchemaTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.062 sec Running org.apache.xmpbox.schema.PhotoshopSchemaTest Tests run: 90, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.262 sec Running org.apache.xmpbox.schema.XmpRightsSchemaTest Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.025 sec Running org.apache.xmpbox.schema.AdobePDFTest Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.027 sec Running org.apache.xmpbox.TestXMPWithDefinedSchemas Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.056 sec Running org.apache.xmpbox.parser.DeserializationTest Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.095 sec Running org.apache.xmpbox.DoubleSameTypeSchemaTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec Running org.apache.xmpbox.SaveMetadataHelperTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec Results : Tests run: 424, Failures: 0, Errors: 0, Skipped: 0 [JENKINS] Recording test results [INFO] [INFO] --- maven-bundle-plugin:2.3.7:bundle (default-bundle) @ xmpbox --- [INFO] [INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ xmpbox --- [INFO] [INFO] --- apache-rat-plugin:0.6:check (default) @ xmpbox --- [INFO] Exclude: release.properties [INFO] [INFO] --- maven-install-plugin:2.3.1:install (default-install) @ xmpbox --- [INFO] Installing https://builds.apache.org/job/PDFBox%201.8.x%20(JDK7)/ws/1.8/xmpbox/target/xmpbox-1.8.8-SNAPSHOT.jar to /home/jenkins/jenkins-slave/maven-repositories/0/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/xmpbox-1.8.8-SNAPSHOT.jar [INFO] Installing https://builds.apache.org/job/PDFBox%201.8.x%20(JDK7)/ws/1.8/xmpbox/pom.xml to /home/jenkins/jenkins-slave/maven-repositories/0/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/xmpbox-1.8.8-SNAPSHOT.pom [INFO] [INFO] --- maven-bundle-plugin:2.3.7:install (default-install) @ xmpbox --- [INFO] Installing org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/xmpbox-1.8.8-SNAPSHOT.jar [INFO] Writing OBR metadata [INFO] [INFO] --- maven-deploy-plugin:2.6:deploy (default-deploy) @ xmpbox --- Downloading: https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/maven-metadata.xml Downloaded: https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/maven-metadata.xml (773 B at 3.3 KB/sec) Uploading: https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/xmpbox-1.8.8-20140917.195426-5.jar Uploaded: https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/xmpbox-1.8.8-20140917.195426-5.jar (112 KB at 405.6 KB/sec) Uploading:
[jira] [Commented] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly
[ https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137862#comment-14137862 ] ASF subversion and git services commented on PDFBOX-2342: - Commit 1625791 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1625791 ] PDFBOX-2342: catch CryptographyException like it is done elsewhere WriteDecodedDoc cant decrypt pdf form correctly --- Key: PDFBOX-2342 URL: https://issues.apache.org/jira/browse/PDFBOX-2342 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Fix For: 1.8.8, 2.0.0 Attachments: test.pdf java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar WriteDecodedDoc -nonSeq test.pdf country selection is wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2296) Wrong stream length used for truetype font
[ https://issues.apache.org/jira/browse/PDFBOX-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137956#comment-14137956 ] ASF subversion and git services commented on PDFBOX-2296: - Commit 1625817 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1625817 ] PDFBOX-2296: fix documentation Wrong stream length used for truetype font -- Key: PDFBOX-2296 URL: https://issues.apache.org/jira/browse/PDFBOX-2296 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0 Reporter: Tilman Hausherr Fix For: 1.8.8, 2.0.0 The file of PDFBOX-2048 has a wrong encoded font length, it is 4412 in the PDF but it is really about 27350. This wrong length is used to read the encoded font stream and this results in further trouble (EOF). The problem is that the wrong length is passed to createFilteredStream() instead of just calling it without parameters. In cosStream.doDecode() unFilteredStream = filteredStream (there is a FIXME there!!!), and in doDecode(COSName filterName, int filterIndex) unFilteredStream.getLength() is used, which returns the expectedLength. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Jenkins build is back to normal : PDFBox 1.8.x (JDK7) » Apache PDFBox #83
See https://builds.apache.org/job/PDFBox%201.8.x%20(JDK7)/org.apache.pdfbox$pdfbox/83/changes
Jenkins build is back to normal : PDFBox 1.8.x (JDK7) #83
See https://builds.apache.org/job/PDFBox%201.8.x%20(JDK7)/83/changes
Re: [DISCUSS] move documentation and examples to git
The docs are part of the website Currently I mean the cookbook, how to build the project, architecture .. Maruan Am 17.09.2014 um 19:26 schrieb Tilman Hausherr thaush...@t-online.de: Hi Maruan, The examples only. With the docs I assume you mean the website. I've never touched it (although I might in the future), it isn't part of the project, so I don't mind. Tilman Am 17.09.2014 um 19:01 schrieb Maruan Sahyoun: is that because of the examples, the docs or both? BR Maruan Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de: It is a I don't like it, but I can live with it but I think it might be a pain. A soft -1. Tilman Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler: Hi, Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03 geschrieben: -1, I don't like the idea to have different repository types. Hmmm, is this just a I don't like it, but I can live with it or is it a clear veto? In a case of a veto, how about starting with moving parts of the docs to a new git repo? IMO sooner or later the project will move from svn to git and that would be a good opertunity to get used to the general usage of git and of course to the special processes used here at the ASF so that we are not thrown in at the deep end after the migration. Tilman BR Andreas Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun: Hi there, in order to make it easier for people to contribute to the documentation and examples I thought about the potential benefits of moving these to a git based repository instead of svn. The main idea behind that is to allow people to contribute via github opening another channel of communication and making it easier to contribute. Proposed names are pdfbox-docs and pdfbox-examples. Take a look at https://github.com/apache/cordova-docs for an example of that. I haven’t thought about all potential implications and changes necessary yet but wanted to get a first feedback about support for that idea before putting more effort into that. WDYT? Maruan
[jira] [Commented] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
[ https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138034#comment-14138034 ] ASF subversion and git services commented on PDFBOX-2358: - Commit 1625834 from [~jahewson] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1625834 ] PDFBOX-2358: Move CMaps from PDFBox to FontBox and remove ResourceLoader ExternalFonts uses classloader of class in font-box --- Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: PDFBox-trunk » PDFBox parent #1280
See https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-parent/1280/ -- [...truncated 824 lines...] Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-sink-api/1.4/doxia-sink-api-1.4.jar (11 KB at 1215.7 KB/sec) Downloading: http://repo.maven.apache.org/maven2/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-logging-api/1.4/doxia-logging-api-1.4.jar (12 KB at 1380.7 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/httpcomponents/httpclient/4.0.2/httpclient-4.0.2.jar Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-core/1.4/doxia-core-1.4.jar (162 KB at 13419.2 KB/sec) Downloading: http://repo.maven.apache.org/maven2/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar Downloaded: http://repo.maven.apache.org/maven2/org/sonatype/sisu/sisu-guice/2.1.7/sisu-guice-2.1.7-noaop.jar (461 KB at 20029.6 KB/sec) Downloading: http://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.3/commons-codec-1.3.jar Downloaded: http://repo.maven.apache.org/maven2/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar (190 KB at 8627.2 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/httpcomponents/httpcore/4.0.1/httpcore-4.0.1.jar Downloaded: http://repo.maven.apache.org/maven2/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar (60 KB at 4233.1 KB/sec) Downloaded: http://repo.maven.apache.org/maven2/org/apache/httpcomponents/httpclient/4.0.2/httpclient-4.0.2.jar (287 KB at 13001.2 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-apt/1.4/doxia-module-apt-1.4.jar Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-xhtml/1.4/doxia-module-xhtml-1.4.jar Downloaded: http://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.3/commons-codec-1.3.jar (46 KB at 2074.1 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-xdoc/1.4/doxia-module-xdoc-1.4.jar Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-apt/1.4/doxia-module-apt-1.4.jar (51 KB at 3635.0 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-fml/1.4/doxia-module-fml-1.4.jar Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-xhtml/1.4/doxia-module-xhtml-1.4.jar (16 KB at 1001.8 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-markdown/1.4/doxia-module-markdown-1.4.jar Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-xdoc/1.4/doxia-module-xdoc-1.4.jar (36 KB at 3580.4 KB/sec) Downloaded: http://repo.maven.apache.org/maven2/org/apache/httpcomponents/httpcore/4.0.1/httpcore-4.0.1.jar (169 KB at 7340.7 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/pegdown/pegdown/1.2.1/pegdown-1.2.1.jar Downloading: http://repo.maven.apache.org/maven2/org/parboiled/parboiled-java/1.1.4/parboiled-java-1.1.4.jar Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-markdown/1.4/doxia-module-markdown-1.4.jar (12 KB at 1626.3 KB/sec) Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-fml/1.4/doxia-module-fml-1.4.jar (37 KB at 4617.3 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/parboiled/parboiled-core/1.1.4/parboiled-core-1.1.4.jar Downloading: http://repo.maven.apache.org/maven2/org/ow2/asm/asm/4.1/asm-4.1.jar Downloaded: http://repo.maven.apache.org/maven2/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar (1201 KB at 23083.0 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/ow2/asm/asm-tree/4.1/asm-tree-4.1.jar Downloaded: http://repo.maven.apache.org/maven2/org/pegdown/pegdown/1.2.1/pegdown-1.2.1.jar (58 KB at 7154.4 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/ow2/asm/asm-analysis/4.1/asm-analysis-4.1.jar Downloaded: http://repo.maven.apache.org/maven2/org/parboiled/parboiled-java/1.1.4/parboiled-java-1.1.4.jar (72 KB at 7940.5 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/ow2/asm/asm-util/4.1/asm-util-4.1.jar Downloaded: http://repo.maven.apache.org/maven2/org/ow2/asm/asm/4.1/asm-4.1.jar (47 KB at 5138.8 KB/sec) Downloading: http://repo.maven.apache.org/maven2/javax/servlet/servlet-api/2.5/servlet-api-2.5.jar Downloaded: http://repo.maven.apache.org/maven2/org/ow2/asm/asm-tree/4.1/asm-tree-4.1.jar (22 KB at 2705.3 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-decoration-model/1.4/doxia-decoration-model-1.4.jar Downloaded: http://repo.maven.apache.org/maven2/org/parboiled/parboiled-core/1.1.4/parboiled-core-1.1.4.jar (181 KB at 13893.0 KB/sec) Downloading:
Re: Custom TextStripper / PDGraphicsState Not Reading Color
Hi All Just to follow up on this thread, I haven’t yet removed the .properties functionality. However, it has now become not just desirable but necessary, as PDFBOX-2358 has shown that PDFBox is not handling resource loading in an OSGI compatible manner. Basically, a class shouldn’t load resources from other packages, which means that the mechanism for overloading operators via .properties isn’t safe in subclasses. The PrintImageLocations in the “examples” package breaks this rule with the following code: public PrintImageLocations() throws IOException { super( ResourceLoader.loadProperties( org/apache/pdfbox/resources/PDFTextStripper.properties, true ) ); } Because it’s loading resources in the “pdfbox” package. What this also means is that nobody can subclass the built-in PDFBox classes and safely load the built-in PDFBox .properties. I’m going to migrate the .properties to the existing registerOperatorProcessor() mechanism. -- John On 30 Jul 2014, at 10:10, John Hewson j...@jahewson.com wrote: On 29 Jul 2014, at 23:12, Maruan Sahyoun sahy...@fileaffairs.de wrote: +1 for removing the .properties file if the new mechanism is easier to understand and handle. The discussion doesn’t provide that proof or some information about that. How would a replacement look like? Basically like registerOperatorProcessor(), as used in PreflightStreamEngine. OTOH if it’s a documentation issue we could also add some more information to the javadocs to explain the dependencies. We could add a register/unregister method to allow to add/remove custom operator handling or provide a service discovery mechanism. This way we still have the old flexibility. As Andreas notes, there’s a registerOperatorProcessor method which does this, so the mechanism is already in place. The problem is not that we don’t have the mechanism, it’s that we’re using .properties files at all. The list of operator’s can’t be controlled from both code and from .properties lists, one source has to be authoritative - otherwise we’d end up with a situation where we have an operator disabled in a .properties file and then re-enabled in code. Currently we have a situation where that could happen. Therefore, removing the .properties is the only workable solution. It’s important to note that it’s very, very unlikely that anybody is using the .properties files in a use-case where they are not also making some code changes, so the supposed benefit of “not having to recompile” never existed. Adding an operator would always require compile-time changes to PDFBox so that the PDFStreamEngine subclasses actually does something with the new operator. -- John BR Maruan Am 29.07.2014 um 21:48 schrieb John Hewson j...@jahewson.com: Right but we need to address the confusion and complexity that has been caused by .properties files which made PDFBOX-2246 so tricky to figure out. Lets remove this wart! -- John On 29 Jul 2014, at 10:44, Tilman Hausherr thaush...@t-online.de wrote: Hi, At this time, the problem I see and wanted to solve (PDFBOX-2246) exists regardless whether we use a properties file or initialize directly in the code. Tilman Am 29.07.2014 19:41, schrieb John Hewson: On 29 Jul 2014, at 03:44, Andreas Lehmkühler andr...@lehmi.de wrote: Hi, it's not a black and white issue (comments inline) John Hewson j...@jahewson.com hat am 29. Juli 2014 um 07:44 geschrieben: Yes, really I should have said subclasses of PDFStreamEngine - that's where the .properties file originates. I'd propose replacing the properties mechanism with a simple method containing the mapping which can be overridden in subclasses. Ultimately, users expect to be able to subclass the behaviour of a class by just subclassing the class. PDFStreamEngine doesn't configure any operator set itself. The subclasses are supposed to configure their own set of operators depending on the particular usecase. E.g. to extend the text extraction one has to subclass PDFTextStripper and so on. It’s PDFStreamEngine which implements the .property mechanism though, via the PDFStreamEngine(Properties properties) constructor. E.g. to extend the text extraction one has to subclass PDFTextStripper and so on. That’s true, but it’s only half the story, don’t forget that the .properties files need to be copied and pasted elsewhere and modified along with overriding which .property file is passed in the constructor if you want to truly override the class’ behaviour. We've seen a number of incidents of confusion on the mailing list due to the current design. IMHO, most of the confusion is based on the lack of knowledge of the pdf spec. One can't understand how pdfbox works under the hood by simply looking at the code. One has to understand the pdf spec as well, at least the base concepts. I’m specifically talking about
[jira] [Commented] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
[ https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138134#comment-14138134 ] ASF subversion and git services commented on PDFBOX-2358: - Commit 1625840 from [~jahewson] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1625840 ] PDFBOX-2358: Remove ResourceLoader usage in PDFBox ExternalFonts uses classloader of class in font-box --- Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
[ https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138135#comment-14138135 ] John Hewson commented on PDFBOX-2358: - I've also removed as may uses of PDFBox's ResourceLoader as possible, because as you point out it is obscure and dangerous. It is now only used by PDFStreamEngine and its subclasses for loading .properties files. However, this usage is itself not OSGI compatible as it is designed to by subclasses by classes in a different module, where loading resources from the pdfbox module is not OSGI friendly. This looks like the final straw for the PDFStreamEngine .properties mechanism, which was already a candidate for removal. ExternalFonts uses classloader of class in font-box --- Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box
[ https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138135#comment-14138135 ] John Hewson edited comment on PDFBOX-2358 at 9/17/14 10:39 PM: --- I've also removed as may uses of PDFBox's ResourceLoader as possible, because as you point out it is obscure and dangerous. It is now only used by PDFStreamEngine and its subclasses for loading .properties files. However, this usage is itself not OSGI compatible as it is designed to by used by subclasses in a different module, where loading resources from the pdfbox module is not OSGI friendly. This looks like the final straw for the PDFStreamEngine .properties mechanism, which was already a candidate for removal. was (Author: jahewson): I've also removed as may uses of PDFBox's ResourceLoader as possible, because as you point out it is obscure and dangerous. It is now only used by PDFStreamEngine and its subclasses for loading .properties files. However, this usage is itself not OSGI compatible as it is designed to by subclasses by classes in a different module, where loading resources from the pdfbox module is not OSGI friendly. This looks like the final straw for the PDFStreamEngine .properties mechanism, which was already a candidate for removal. ExternalFonts uses classloader of class in font-box --- Key: PDFBOX-2358 URL: https://issues.apache.org/jira/browse/PDFBOX-2358 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson ExternalFonts loads some default fonts via the org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own classloader (ResourceLoader.class.getClassLoader()) for loading the given resource. The problem is that the resource is in the PDFBox project and the ResourceLoader in the FontBox. In an OSGI environment this is a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: PDFBox-trunk #1281
See https://builds.apache.org/job/PDFBox-trunk/1281/changes Changes: [jahewson] PDFBOX-2358: Remove ResourceLoader usage in PDFBox -- [...truncated 1756 lines...] [TASKS] Scanning folder 'https://builds.apache.org/job/PDFBox-trunk/ws/trunk/parent' for files matching the pattern '**/*.java' - excludes: [TASKS] Found 0 files to scan for tasks Found 0 open tasks. [TASKS] Computing warning deltas based on reference build #1279 [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ pdfbox-parent --- Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.4/maven-common-artifact-filters-1.4.pom Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.4/maven-common-artifact-filters-1.4.pom (4 KB at 39.9 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-container-default/1.5.5/plexus-container-default-1.5.5.pom Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-container-default/1.5.5/plexus-container-default-1.5.5.pom (3 KB at 76.9 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-classworlds/2.2.2/plexus-classworlds-2.2.2.pom Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-classworlds/2.2.2/plexus-classworlds-2.2.2.pom (4 KB at 112.5 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/xbean/xbean-reflect/3.4/xbean-reflect-3.4.pom Downloaded: http://repo.maven.apache.org/maven2/org/apache/xbean/xbean-reflect/3.4/xbean-reflect-3.4.pom (3 KB at 78.4 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/xbean/xbean/3.4/xbean-3.4.pom Downloaded: http://repo.maven.apache.org/maven2/org/apache/xbean/xbean/3.4/xbean-3.4.pom (19 KB at 377.0 KB/sec) Downloading: http://repo.maven.apache.org/maven2/commons-logging/commons-logging-api/1.1/commons-logging-api-1.1.pom Downloaded: http://repo.maven.apache.org/maven2/commons-logging/commons-logging-api/1.1/commons-logging-api-1.1.pom (6 KB at 149.2 KB/sec) Downloading: http://repo.maven.apache.org/maven2/com/google/collections/google-collections/1.0/google-collections-1.0.pom Downloaded: http://repo.maven.apache.org/maven2/com/google/collections/google-collections/1.0/google-collections-1.0.pom (3 KB at 67.2 KB/sec) Downloading: http://repo.maven.apache.org/maven2/com/google/google/1/google-1.pom Downloaded: http://repo.maven.apache.org/maven2/com/google/google/1/google-1.pom (2 KB at 43.4 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.0-alpha-7/plexus-resources-1.0-alpha-7.pom Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.0-alpha-7/plexus-resources-1.0-alpha-7.pom (2 KB at 36.9 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.4/maven-common-artifact-filters-1.4.jar Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.0-alpha-7/plexus-resources-1.0-alpha-7.jar Downloading: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/1.5.15/plexus-utils-1.5.15.jar Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.4/maven-common-artifact-filters-1.4.jar (31 KB at 629.7 KB/sec) Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.0-alpha-7/plexus-resources-1.0-alpha-7.jar (23 KB at 360.5 KB/sec) Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/1.5.15/plexus-utils-1.5.15.jar (223 KB at 1638.0 KB/sec) [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ pdfbox-parent --- Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-exec/1.1/maven-reporting-exec-1.1.pom Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-exec/1.1/maven-reporting-exec-1.1.pom (11 KB at 289.2 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/0.3/maven-shared-utils-0.3.pom Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/0.3/maven-shared-utils-0.3.pom (4 KB at 104.0 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-components/18/maven-shared-components-18.pom Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-components/18/maven-shared-components-18.pom (5 KB at 137.7 KB/sec) Downloading: http://repo.maven.apache.org/maven2/com/google/code/findbugs/jsr305/2.0.1/jsr305-2.0.1.pom Downloaded: http://repo.maven.apache.org/maven2/com/google/code/findbugs/jsr305/2.0.1/jsr305-2.0.1.pom (965 B at 27.7 KB/sec) Downloading:
[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.
[ https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138167#comment-14138167 ] John Hewson commented on PDFBOX-2301: - Why use scratch files at all for parsing? Couldn't we use a java.io.RandomAccessFile to read the PDF (or perhaps one per COS stream) which would avoid the extra round-trip to disk: With Java's RandomAccessFile: - COSStream uses RandomAccessFile to read the data from disk when it is needed With scratch files: - Read the entire COS stream from the PDF - Write it to disk - Re-read the stream from disk when COSStream is read RandomAccessBuffer consumes too much memory. Key: PDFBOX-2301 URL: https://issues.apache.org/jira/browse/PDFBOX-2301 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.6, 2.0.0 Reporter: gee Assignee: Andreas Lehmkühler Fix For: 2.0.0 Attachments: clone.diff, clone2.diff, clone3.diff RandomAccessBuffer holds uncompressed image during operation because it is what exactly pdfbox ExtractImages do. but holding uncompressed image instead of compressed one in memory consumes too much memory, not excluding many PDF XObjects that can use filter to compress itself. It would be good if pdfbox provides option that reverts to COSObject state just before the RandomAccess object created(the state that pdf XObject stream parsed and COSDictionary objects haven't created because user doesn't requested it using get() method.) It is crucial feature so that pdfbox can analyze huge pdf file(100MB). In current source, one must close COSStream unless required(and I know closed stream cannot reopened again.) Class Name | Shallow Heap | Retained Heap -- org.apache.pdfbox.cos.COSObject @ 0x5ad4940 | 24 | 8,187,264 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020 | 0 | 0 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080 | 24 |24 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 | 32 | 8,187,216 | |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00 | 8 | 8 | |- items java.util.LinkedHashMap @ 0x5b2a0f0 | 56 | 552 | |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128 | 48 |
[jira] [Resolved] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
[ https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson resolved PDFBOX-2357. - Resolution: Fixed Fix Version/s: 2.0.0 PDTrueTypeFont has no method to load font from stream - Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson Fix For: 2.0.0 The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
[ https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson reassigned PDFBOX-2357: --- Assignee: John Hewson PDTrueTypeFont has no method to load font from stream - Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson Fix For: 2.0.0 The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
[ https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138170#comment-14138170 ] ASF subversion and git services commented on PDFBOX-2357: - Commit 1625849 from [~jahewson] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1625849 ] PDFBOX-2357: Add PDTrueTypeFont constructor with InputStream PDTrueTypeFont has no method to load font from stream - Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Assignee: John Hewson Fix For: 2.0.0 The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream
[ https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138169#comment-14138169 ] John Hewson commented on PDFBOX-2357: - Yep, it was a mistake, I will add this method back. PDTrueTypeFont has no method to load font from stream - Key: PDFBOX-2357 URL: https://issues.apache.org/jira/browse/PDFBOX-2357 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Cornelis Hoeflake Fix For: 2.0.0 The PDTrueTypeFont had formely static method to load a font from a stream. Now that method is gone. As far as I can see without a reason. Probably removed by mistake. Could that method be restored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2355) newDocuments is private in Splitter
[ https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138183#comment-14138183 ] John Hewson commented on PDFBOX-2355: - {quote} I'll rewrite it properly once the new code is released. {quote} If you wan't until then we won't be able to incorporate any feedback into 2.0 before the API is made stable. newDocuments is private in Splitter --- Key: PDFBOX-2355 URL: https://issues.apache.org/jira/browse/PDFBOX-2355 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.8.6 Environment: Ubuntu 14.04, Java 8_20 Reporter: G. Ralph Kuntz Assignee: John Hewson Labels: pdfbox Fix For: 2.0.0 The method `createNewDocument` in `Splitter` is protected, so it can be overridden, but one of the things it needs to do with the new document is add it to the `newDocuments` list, which is private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138197#comment-14138197 ] Cetra Free commented on PDFBOX-2356: I'm just using the code from here: http://pdfbox.apache.org/cookbook/pdfavalidation.html {code} ValidationResult result = null; FileDataSource fd = new FileDataSource(args[0]); PreflightParser parser = new PreflightParser(fd); try { /* Parse the PDF file with PreflightParser that inherits from the NonSequentialParser. * Some additional controls are present to check a set of PDF/A requirements. * (Stream length consistency, EOL after some Keyword...) */ parser.parse(); /* Once the syntax validation is done, * the parser can provide a PreflightDocument * (that inherits from PDDocument) * This document process the end of PDF/A validation. */ PreflightDocument document = parser.getPreflightDocument(); document.validate(); // Get validation result result = document.getResult(); document.close(); } catch (SyntaxValidationException e) { /* the parse method can throw a SyntaxValidationException *if the PDF file can't be parsed. */ In this case, the exception contains an instance of ValidationResult result = e.getResult(); } // display validation result if (result.isValid()) { System.out.println(The file + args[0] + is a valid PDF/A-1b file); } else { System.out.println(The file + args[0] + is not valid, error(s) :); for (ValidationError error : result.getErrorsList()) { System.out.println(error.getErrorCode() + : + error.getDetails()); } } {code} Error Validating PDF Archive Document - Key: PDFBOX-2356 URL: https://issues.apache.org/jira/browse/PDFBOX-2356 Project: PDFBox Issue Type: Bug Components: Preflight Affects Versions: 1.8.4, 1.8.5, 1.8.6 Reporter: Cetra Free Attachments: pdfafile.pdf When trying to validate a PDF archive file (attached to this ticket) we get the following error: {code} 7.2 - Error on MetaData, ModificationDate present in the document catalog dictionary doesn't match with XMP information {code} This is because the the Modification Date in the Dictionary is parsed differently from the XMP Metadata. The XMP Metadata is correct, but the Date from the Dictionary appends an extra 30 minutes. The following is the raw COSObject from the PDF File {code} COSString{D:20140917122850+09'30'} {code} The Long value should be *141092273* The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the Date with Long *141092453* which is 30 minutes ahead. XMP Modification Date is parsed differently and returns the correct date. This means that validation will fail for PDF Archives. My suggestion would be to refactor the parseDate function to use the Standard Java library. Here's an example class which will be compatible with the PDF Specification: {code} static class DateParser { private MapInteger, SimpleDateFormat formats = new HashMapInteger, SimpleDateFormat(); public DateParser() { String expr = ; for(String part: Arrays.asList(, MM, dd, HH, mm, ss, Z)) { expr = expr + part; formats.put(expr.length(), new SimpleDateFormat(expr)); } } public Calendar parseDate(String expr) { try { expr = expr.replace(D:, ).replace(', ).replace(Z, +); Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); Calendar calendar = Calendar.getInstance(); calendar.setTime(date); return calendar; } catch (ParseException e) { return null; } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: PDFBox-trunk » Apache FontBox #1282
See https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$fontbox/1282/ -- [INFO] [INFO] [INFO] Building Apache FontBox 2.0.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ fontbox --- [TASKS] Scanning folder 'https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$fontbox/ws/' for files matching the pattern '**/*.java' - excludes: [TASKS] Found 93 files to scan for tasks Found 17 open tasks. [TASKS] Computing warning deltas based on reference build #1279 [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ fontbox --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ fontbox --- [INFO] Using 'ISO-8859-1' encoding to copy filtered resources. [INFO] Copying 89 resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ fontbox --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 88 source files to https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$fontbox/ws/target/classes [WARNING] Note: Some input files use unchecked or unsafe operations. [WARNING] Note: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ fontbox --- [INFO] Using 'ISO-8859-1' encoding to copy filtered resources. [INFO] Copying 2 resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ fontbox --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 5 source files to https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$fontbox/ws/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ fontbox --- [INFO] Surefire report directory: https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$fontbox/ws/target/surefire-reports --- T E S T S --- Running org.apache.fontbox.cff.Type1FontUtilTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.027 sec - in org.apache.fontbox.cff.Type1FontUtilTest Running org.apache.fontbox.cmap.TestCMap Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec - in org.apache.fontbox.cmap.TestCMap Running org.apache.fontbox.cmap.TestCMapParser Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec - in org.apache.fontbox.cmap.TestCMapParser Running org.apache.fontbox.ttf.TestTTFParser Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.044 sec - in org.apache.fontbox.ttf.TestTTFParser Running org.apache.fontbox.ttf.TestMemoryTTFDataStream Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in org.apache.fontbox.ttf.TestMemoryTTFDataStream Results : Tests run: 7, Failures: 0, Errors: 0, Skipped: 0 [JENKINS] Recording test results [INFO] [INFO] --- maven-bundle-plugin:2.4.0:bundle (default-bundle) @ fontbox --- [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ fontbox --- [INFO] [INFO] --- apache-rat-plugin:0.10:check (default) @ fontbox --- [INFO] 51 implicit excludes (use -debug for more details). [INFO] Exclude: release.properties [INFO] 202 resources included (use -debug for more details) [INFO] Rat check: Summary of files. Unapproved: 89 unknown: 89 generated: 0 approved: 107 licence.
Re: [DISCUSS] move documentation and examples to git
I agree with Tilman on this point, the examples need to stay in the trunk where they can be built along with it. It’s very common to modify an example to take into account API changes. They’re also currently distributed along with the main PDFBox source bundle, which is a good thing. I’d be surprised if anybody outside of the project wanted to contribute to the documentation, almost nobody seems to like writing it. Perhaps we could do this as a trial - see if it really increases contributions or not? It would be great if it did. It’s worth adding that I’m (reluctantly) against moving PDFBox trunk over to GitHub because GitHub Issues is not powerful enough for our needs (e.g. no file attachments), which is really a shame. -- John On 17 Sep 2014, at 10:26, Tilman Hausherr thaush...@t-online.de wrote: Hi Maruan, The examples only. With the docs I assume you mean the website. I've never touched it (although I might in the future), it isn't part of the project, so I don't mind. Tilman Am 17.09.2014 um 19:01 schrieb Maruan Sahyoun: is that because of the examples, the docs or both? BR Maruan Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de: It is a I don't like it, but I can live with it but I think it might be a pain. A soft -1. Tilman Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler: Hi, Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03 geschrieben: -1, I don't like the idea to have different repository types. Hmmm, is this just a I don't like it, but I can live with it or is it a clear veto? In a case of a veto, how about starting with moving parts of the docs to a new git repo? IMO sooner or later the project will move from svn to git and that would be a good opertunity to get used to the general usage of git and of course to the special processes used here at the ASF so that we are not thrown in at the deep end after the migration. Tilman BR Andreas Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun: Hi there, in order to make it easier for people to contribute to the documentation and examples I thought about the potential benefits of moving these to a git based repository instead of svn. The main idea behind that is to allow people to contribute via github opening another channel of communication and making it easier to contribute. Proposed names are pdfbox-docs and pdfbox-examples. Take a look at https://github.com/apache/cordova-docs for an example of that. I haven’t thought about all potential implications and changes necessary yet but wanted to get a first feedback about support for that idea before putting more effort into that. WDYT? Maruan
[jira] [Comment Edited] (PDFBOX-2340) Overhaul PDFBox Documentation
[ https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138255#comment-14138255 ] John Hewson edited comment on PDFBOX-2340 at 9/18/14 12:14 AM: --- The first mockup of the documentation was better: as a user I want content, not empty green space, or - heaven forbid - anything which resembles a PDF :). Perhaps we're getting ahead of ourselves here, the content should come first. was (Author: jahewson): The first mockup of the documentation was better: as a user I want content, not empty green space, or - heaven forbid - anything which resembles a PDF :). Perhaps we're getting ahead of ourselves here, after all content should come first. Overhaul PDFBox Documentation - Key: PDFBOX-2340 URL: https://issues.apache.org/jira/browse/PDFBOX-2340 Project: PDFBox Issue Type: Improvement Components: Documentation Reporter: Maruan Sahyoun Attachments: Mockup-20140912.png, Mockup_Documentation.png In oder to make it easier for users of PDFBox to work with the library there shall be an enhanced documentation consisting of an introduction, API references and more well documented examples and code snippets (Cookbook). In order to make it easier to contribute the Cookbook shall be build automatically from the examples/snippet ‚repository‘. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2340) Overhaul PDFBox Documentation
[ https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138255#comment-14138255 ] John Hewson commented on PDFBOX-2340: - The first mockup of the documentation was better: as a user I want content, not empty green space, or - heaven forbid - anything which resembles a PDF :). Perhaps we're getting ahead of ourselves here, after all content should come first. Overhaul PDFBox Documentation - Key: PDFBOX-2340 URL: https://issues.apache.org/jira/browse/PDFBOX-2340 Project: PDFBox Issue Type: Improvement Components: Documentation Reporter: Maruan Sahyoun Attachments: Mockup-20140912.png, Mockup_Documentation.png In oder to make it easier for users of PDFBox to work with the library there shall be an enhanced documentation consisting of an introduction, API references and more well documented examples and code snippets (Cookbook). In order to make it easier to contribute the Cookbook shall be build automatically from the examples/snippet ‚repository‘. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] move documentation and examples to git
Maruan Sahyoun Am 18.09.2014 um 02:03 schrieb John Hewson j...@jahewson.com: I agree with Tilman on this point, the examples need to stay in the trunk where they can be built along with it. It’s very common to modify an example to take into account API changes. They’re also currently distributed along with the main PDFBox source bundle, which is a good thing. I’d be surprised if anybody outside of the project wanted to contribute to the documentation, almost nobody seems to like writing it. Perhaps we could do this as a trial - see if it really increases contributions or not? It would be great if it did. OK so lets try with the docs. To mention it for completness - the build process for the web site and the documentation contained within will still be done by the Apache CMS. It’s worth adding that I’m (reluctantly) against moving PDFBox trunk over to GitHub because GitHub Issues is not powerful enough for our needs (e.g. no file attachments), which is really a shame. Issue tracking would still be done using Jira. Same as for most other Apache projects -- John On 17 Sep 2014, at 10:26, Tilman Hausherr thaush...@t-online.de wrote: Hi Maruan, The examples only. With the docs I assume you mean the website. I've never touched it (although I might in the future), it isn't part of the project, so I don't mind. Tilman Am 17.09.2014 um 19:01 schrieb Maruan Sahyoun: is that because of the examples, the docs or both? BR Maruan Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de: It is a I don't like it, but I can live with it but I think it might be a pain. A soft -1. Tilman Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler: Hi, Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03 geschrieben: -1, I don't like the idea to have different repository types. Hmmm, is this just a I don't like it, but I can live with it or is it a clear veto? In a case of a veto, how about starting with moving parts of the docs to a new git repo? IMO sooner or later the project will move from svn to git and that would be a good opertunity to get used to the general usage of git and of course to the special processes used here at the ASF so that we are not thrown in at the deep end after the migration. Tilman BR Andreas Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun: Hi there, in order to make it easier for people to contribute to the documentation and examples I thought about the potential benefits of moving these to a git based repository instead of svn. The main idea behind that is to allow people to contribute via github opening another channel of communication and making it easier to contribute. Proposed names are pdfbox-docs and pdfbox-examples. Take a look at https://github.com/apache/cordova-docs for an example of that. I haven’t thought about all potential implications and changes necessary yet but wanted to get a first feedback about support for that idea before putting more effort into that. WDYT? Maruan
[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.
[ https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138545#comment-14138545 ] Tilman Hausherr commented on PDFBOX-2301: - I thought about it too, but there are some bizarre situations where the objects are itself compressed in a stream, or when streams are encrypted. So in these case we would have to have temp files in memory or on disk. RandomAccessBuffer consumes too much memory. Key: PDFBOX-2301 URL: https://issues.apache.org/jira/browse/PDFBOX-2301 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.6, 2.0.0 Reporter: gee Assignee: Andreas Lehmkühler Fix For: 2.0.0 Attachments: clone.diff, clone2.diff, clone3.diff RandomAccessBuffer holds uncompressed image during operation because it is what exactly pdfbox ExtractImages do. but holding uncompressed image instead of compressed one in memory consumes too much memory, not excluding many PDF XObjects that can use filter to compress itself. It would be good if pdfbox provides option that reverts to COSObject state just before the RandomAccess object created(the state that pdf XObject stream parsed and COSDictionary objects haven't created because user doesn't requested it using get() method.) It is crucial feature so that pdfbox can analyze huge pdf file(100MB). In current source, one must close COSStream unless required(and I know closed stream cannot reopened again.) Class Name | Shallow Heap | Retained Heap -- org.apache.pdfbox.cos.COSObject @ 0x5ad4940 | 24 | 8,187,264 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020 | 0 | 0 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080 | 24 |24 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 | 32 | 8,187,216 | |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00 | 8 | 8 | |- items java.util.LinkedHashMap @ 0x5b2a0f0 | 56 | 552 | |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128 | 48 | 8,186,528 | | |- class class org.apache.pdfbox.io.RandomAccessBuffer @ 0x5ad2b00