Re: Problem With MergeUtility
Hi, not a direct answer to your question but could you try PDDocument.loadNonSeq instead? BR Maruan Sahyoun Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com: Hello guys, Has anyone had any problem with this? Any idea why it happens? What would be a good value for pushBackSize so this does not happen? Thanks! Partial stack trace: org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)
Re: Problem With MergeUtility
Where? Here's the code that causes that: PDFMergeUtility util = new PDFMergeUtility(); for (File file : set) { try{ if( file.exists() ){ util.addSource(file); } } catch ( Exception e ){ //log e } } util.setDestinationFileName(...); util.mergeDocuments(); On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.dewrote: Hi, not a direct answer to your question but could you try PDDocument.loadNonSeq instead? BR Maruan Sahyoun Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com: Hello guys, Has anyone had any problem with this? Any idea why it happens? What would be a good value for pushBackSize so this does not happen? Thanks! Partial stack trace: org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)
Re: Problem With MergeUtility
Hi, as far as I remember PDFMergeUtility is one of the last utilities not supporting loadNonSeq currently. As a workaround get the source of PDFMergeUtility, change PDDocument.load to PDDocument.loadNonSeq (you may provide null as buffer parameter). Best, Timo Am 13.03.2014 16:46, schrieb Alin Mazilu: Where? Here's the code that causes that: PDFMergeUtility util = new PDFMergeUtility(); for (File file : set) { try{ if( file.exists() ){ util.addSource(file); } } catch ( Exception e ){ //log e } } util.setDestinationFileName(...); util.mergeDocuments(); On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.dewrote: Hi, not a direct answer to your question but could you try PDDocument.loadNonSeq instead? BR Maruan Sahyoun Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com: Hello guys, Has anyone had any problem with this? Any idea why it happens? What would be a good value for pushBackSize so this does not happen? Thanks! Partial stack trace: org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186) -- Timo Boehme OntoChem GmbH H.-Damerow-Str. 4 06120 Halle/Saale T: +49 345 4780474 F: +49 345 4780471 timo.boe...@ontochem.com _ OntoChem GmbH Geschäftsführer: Dr. Lutz Weber Sitz: Halle / Saale Registergericht: Stendal Registernummer: HRB 215461 _
Re: Problem With MergeUtility
this issue is logged at PDFBOX-1964 with a potential patch attached. BR Maruan Sahyoun Am 13.03.2014 um 17:52 schrieb Timo Boehme timo.boe...@ontochem.com: Hi, as far as I remember PDFMergeUtility is one of the last utilities not supporting loadNonSeq currently. As a workaround get the source of PDFMergeUtility, change PDDocument.load to PDDocument.loadNonSeq (you may provide null as buffer parameter). Best, Timo Am 13.03.2014 16:46, schrieb Alin Mazilu: Where? Here's the code that causes that: PDFMergeUtility util = new PDFMergeUtility(); for (File file : set) { try{ if( file.exists() ){ util.addSource(file); } } catch ( Exception e ){ //log e } } util.setDestinationFileName(...); util.mergeDocuments(); On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.dewrote: Hi, not a direct answer to your question but could you try PDDocument.loadNonSeq instead? BR Maruan Sahyoun Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com: Hello guys, Has anyone had any problem with this? Any idea why it happens? What would be a good value for pushBackSize so this does not happen? Thanks! Partial stack trace: org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186) -- Timo Boehme OntoChem GmbH H.-Damerow-Str. 4 06120 Halle/Saale T: +49 345 4780474 F: +49 345 4780471 timo.boe...@ontochem.com _ OntoChem GmbH Geschäftsführer: Dr. Lutz Weber Sitz: Halle / Saale Registergericht: Stendal Registernummer: HRB 215461 _
Re: Problem With MergeUtility
Ok, I will try. In my opinion it would be useful if it had the instance variables protected rather than private, that way the class could be extended as needed, like PDFTextStripper. It my situation I would only have to override mergeDocuments(). Anyway, I will try it. Thank you, Alin On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme timo.boe...@ontochem.comwrote: Hi, as far as I remember PDFMergeUtility is one of the last utilities not supporting loadNonSeq currently. As a workaround get the source of PDFMergeUtility, change PDDocument.load to PDDocument.loadNonSeq (you may provide null as buffer parameter). Best, Timo Am 13.03.2014 16:46, schrieb Alin Mazilu: Where? Here's the code that causes that: PDFMergeUtility util = new PDFMergeUtility(); for (File file : set) { try{ if( file.exists() ){ util.addSource(file); } } catch ( Exception e ){ //log e } } util.setDestinationFileName(...); util.mergeDocuments(); On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi, not a direct answer to your question but could you try PDDocument.loadNonSeq instead? BR Maruan Sahyoun Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com: Hello guys, Has anyone had any problem with this? Any idea why it happens? What would be a good value for pushBackSize so this does not happen? Thanks! Partial stack trace: org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream( BaseParser.java:546) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments( PDFMergerUtility.java:186) -- Timo Boehme OntoChem GmbH H.-Damerow-Str. 4 06120 Halle/Saale T: +49 345 4780474 F: +49 345 4780471 timo.boe...@ontochem.com _ OntoChem GmbH Geschäftsführer: Dr. Lutz Weber Sitz: Halle / Saale Registergericht: Stendal Registernummer: HRB 215461 _
Re: Problem With MergeUtility
Am 13.03.2014 17:58, schrieb Maruan Sahyoun: this issue is logged at PDFBOX-1964 with a potential patch attached. Reviewed and committed :-) Tilman
Re: Problem With MergeUtility
Hi Alin Thanks for your fix. it would be useful if it had the instance variables protected rather than private, that way the class could be extended as needed, like PDFTextStripper. The problem with making fields protected is that it exposes internal implementation details, making them part of the public API. This prevents us from making internal changes in the future without introducing breaking changes to the public API. In the case of PDFTextStripper, there is a strong use case for using a protected field, because overriding it is the primary mechanism for custom text extraction. Cheers -- John On 13 Mar 2014, at 10:40, Alin Mazilu impet...@gmail.com wrote: Ok, I will try. In my opinion it would be useful if it had the instance variables protected rather than private, that way the class could be extended as needed, like PDFTextStripper. It my situation I would only have to override mergeDocuments(). Anyway, I will try it. Thank you, Alin On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme timo.boe...@ontochem.comwrote: Hi, as far as I remember PDFMergeUtility is one of the last utilities not supporting loadNonSeq currently. As a workaround get the source of PDFMergeUtility, change PDDocument.load to PDDocument.loadNonSeq (you may provide null as buffer parameter). Best, Timo Am 13.03.2014 16:46, schrieb Alin Mazilu: Where? Here's the code that causes that: PDFMergeUtility util = new PDFMergeUtility(); for (File file : set) { try{ if( file.exists() ){ util.addSource(file); } } catch ( Exception e ){ //log e } } util.setDestinationFileName(...); util.mergeDocuments(); On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi, not a direct answer to your question but could you try PDDocument.loadNonSeq instead? BR Maruan Sahyoun Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com: Hello guys, Has anyone had any problem with this? Any idea why it happens? What would be a good value for pushBackSize so this does not happen? Thanks! Partial stack trace: org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream( BaseParser.java:546) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments( PDFMergerUtility.java:186) -- Timo Boehme OntoChem GmbH H.-Damerow-Str. 4 06120 Halle/Saale T: +49 345 4780474 F: +49 345 4780471 timo.boe...@ontochem.com _ OntoChem GmbH Geschäftsführer: Dr. Lutz Weber Sitz: Halle / Saale Registergericht: Stendal Registernummer: HRB 215461 _
Re: Problem With MergeUtility
I know that. No problem. On Thu, Mar 13, 2014 at 2:23 PM, John Hewson j...@jahewson.com wrote: Hi Alin Thanks for your fix. it would be useful if it had the instance variables protected rather than private, that way the class could be extended as needed, like PDFTextStripper. The problem with making fields protected is that it exposes internal implementation details, making them part of the public API. This prevents us from making internal changes in the future without introducing breaking changes to the public API. In the case of PDFTextStripper, there is a strong use case for using a protected field, because overriding it is the primary mechanism for custom text extraction. Cheers -- John On 13 Mar 2014, at 10:40, Alin Mazilu impet...@gmail.com wrote: Ok, I will try. In my opinion it would be useful if it had the instance variables protected rather than private, that way the class could be extended as needed, like PDFTextStripper. It my situation I would only have to override mergeDocuments(). Anyway, I will try it. Thank you, Alin On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme timo.boe...@ontochem.com wrote: Hi, as far as I remember PDFMergeUtility is one of the last utilities not supporting loadNonSeq currently. As a workaround get the source of PDFMergeUtility, change PDDocument.load to PDDocument.loadNonSeq (you may provide null as buffer parameter). Best, Timo Am 13.03.2014 16:46, schrieb Alin Mazilu: Where? Here's the code that causes that: PDFMergeUtility util = new PDFMergeUtility(); for (File file : set) { try{ if( file.exists() ){ util.addSource(file); } } catch ( Exception e ){ //log e } } util.setDestinationFileName(...); util.mergeDocuments(); On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi, not a direct answer to your question but could you try PDDocument.loadNonSeq instead? BR Maruan Sahyoun Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com: Hello guys, Has anyone had any problem with this? Any idea why it happens? What would be a good value for pushBackSize so this does not happen? Thanks! Partial stack trace: org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream( BaseParser.java:546) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments( PDFMergerUtility.java:186) -- Timo Boehme OntoChem GmbH H.-Damerow-Str. 4 06120 Halle/Saale T: +49 345 4780474 F: +49 345 4780471 timo.boe...@ontochem.com _ OntoChem GmbH Geschäftsführer: Dr. Lutz Weber Sitz: Halle / Saale Registergericht: Stendal Registernummer: HRB 215461 _