Re: Problem With MergeUtility

John Hewson Thu, 13 Mar 2014 11:24:26 -0700

Hi Alin

Thanks for your fix.


>  it would be useful if it had the instance
> variables protected rather than private, that way the class could be
> extended as needed, like PDFTextStripper.

The problem with making fields protected is that it exposes internal 
implementation details,
making them part of the public API. This prevents us from making internal 
changes in the
future without introducing breaking changes to the public API.

In the case of PDFTextStripper, there is a strong use case for using a 
protected field,
because overriding it is the primary mechanism for custom text extraction.

Cheers

-- John

On 13 Mar 2014, at 10:40, Alin Mazilu <[email protected]> wrote:

> Ok, I will try. In my opinion it would be useful if it had the instance
> variables protected rather than private, that way the class could be
> extended as needed, like PDFTextStripper. It my situation I would only have
> to override mergeDocuments(). Anyway, I will try it.
> 
> Thank you,
> 
> Alin
> 
> 
> On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme <[email protected]>wrote:
> 
>> Hi,
>> 
>> as far as I remember PDFMergeUtility is one of the last utilities not
>> supporting loadNonSeq currently.
>> 
>> As a workaround get the source of PDFMergeUtility, change PDDocument.load
>> to PDDocument.loadNonSeq  (you may provide null as buffer parameter).
>> 
>> 
>> Best,
>> Timo
>> 
>> 
>> Am 13.03.2014 16:46, schrieb Alin Mazilu:
>> 
>> Where? Here's the code that causes that:
>>> 
>>> PDFMergeUtility util = new PDFMergeUtility();
>>> 
>>> for (File file : set) {
>>> try{
>>> if( file.exists() ){
>>>         util.addSource(file);
>>> }
>>>     } catch ( Exception e ){
>>>        //log e
>>>     }
>>>  }
>>> util.setDestinationFileName(...);
>>> 
>>> util.mergeDocuments();
>>> 
>>> 
>>> On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun <[email protected]
>>>> wrote:
>>> 
>>> Hi,
>>>> 
>>>> not a direct answer to your question but could you try
>>>> PDDocument.loadNonSeq instead?
>>>> 
>>>> BR
>>>> Maruan Sahyoun
>>>> 
>>>> Am 13.03.2014 um 16:16 schrieb Alin Mazilu <[email protected]>:
>>>>> 
>>>>> Hello guys,
>>>>> 
>>>>> 
>>>>> Has anyone had any problem with this? Any idea why it happens? What
>>>>> would
>>>>> be a good value for pushBackSize so this does not happen? Thanks!
>>>>> 
>>>>> 
>>>>> Partial stack trace:
>>>>> 
>>>>> 
>>>>> org.apache.pdfbox.exceptions.WrappedIOException: Could not push back
>>>>> 
>>>> 72940
>>>> 
>>>>> bytes in order to reparse stream. Try increasing push back buffer using
>>>>> system property org.apache.pdfbox.baseParser.pushBackSize
>>>>> 
>>>>> 
>>>>> 
>>>>>                at
>>>>> 
>>>>> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
>>>> BaseParser.java:546)
>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>                at
>>>>> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
>>>>> 
>>>>> 
>>>>> 
>>>>>                at
>>>>> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>>>>> 
>>>>> 
>>>>> 
>>>>>                at
>>>>> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
>>>>> 
>>>>> 
>>>>> 
>>>>>                at
>>>>> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
>>>>> 
>>>>> 
>>>>> 
>>>>>                at
>>>>> 
>>>>> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(
>>>> PDFMergerUtility.java:186)
>>>> 
>>>> 
>>> 
>> 
>> --
>> 
>> Timo Boehme
>> OntoChem GmbH
>> H.-Damerow-Str. 4
>> 06120 Halle/Saale
>> T: +49 345 4780474
>> F: +49 345 4780471
>> [email protected]
>> 
>> _____________________________________________________________________
>> 
>> OntoChem GmbH
>> Geschäftsführer: Dr. Lutz Weber
>> Sitz: Halle / Saale
>> Registergericht: Stendal
>> Registernummer: HRB 215461
>> _____________________________________________________________________
>> 
>>

Re: Problem With MergeUtility

Reply via email to