Re: Problem With MergeUtility

2014-03-13 Thread Maruan Sahyoun
Hi,

not a direct answer to your question but could you try PDDocument.loadNonSeq 
instead?

BR
Maruan Sahyoun

 Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com:
 
 Hello guys,
 
 
 Has anyone had any problem with this? Any idea why it happens? What would
 be a good value for pushBackSize so this does not happen? Thanks!
 
 
 Partial stack trace:
 
 
 org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940
 bytes in order to reparse stream. Try increasing push back buffer using
 system property org.apache.pdfbox.baseParser.pushBackSize
 
 
 
at
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)
 
 
 
at
 org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
 
 
 
at
 org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
 
 
 
at
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
 
 
 
at
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
 
 
 
at
 org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)


Re: Problem With MergeUtility

2014-03-13 Thread Alin Mazilu
Where? Here's the code that causes that:

PDFMergeUtility util = new PDFMergeUtility();

for (File file : set) {
try{
if( file.exists() ){
util.addSource(file);
}
} catch ( Exception e ){
   //log e
}
 }
util.setDestinationFileName(...);

util.mergeDocuments();


On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.dewrote:

 Hi,

 not a direct answer to your question but could you try
 PDDocument.loadNonSeq instead?

 BR
 Maruan Sahyoun

  Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com:
 
  Hello guys,
 
 
  Has anyone had any problem with this? Any idea why it happens? What would
  be a good value for pushBackSize so this does not happen? Thanks!
 
 
  Partial stack trace:
 
 
  org.apache.pdfbox.exceptions.WrappedIOException: Could not push back
 72940
  bytes in order to reparse stream. Try increasing push back buffer using
  system property org.apache.pdfbox.baseParser.pushBackSize
 
 
 
 at
 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)
 
 
 
 at
  org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
 
 
 
 at
  org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
 
 
 
 at
  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
 
 
 
 at
  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
 
 
 
 at
 
 org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)



Re: Problem With MergeUtility

2014-03-13 Thread Timo Boehme

Hi,

as far as I remember PDFMergeUtility is one of the last utilities not 
supporting loadNonSeq currently.


As a workaround get the source of PDFMergeUtility, change 
PDDocument.load to PDDocument.loadNonSeq  (you may provide null as 
buffer parameter).



Best,
Timo


Am 13.03.2014 16:46, schrieb Alin Mazilu:

Where? Here's the code that causes that:

PDFMergeUtility util = new PDFMergeUtility();

for (File file : set) {
try{
if( file.exists() ){
 util.addSource(file);
}
 } catch ( Exception e ){
//log e
 }
  }
util.setDestinationFileName(...);

util.mergeDocuments();


On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.dewrote:


Hi,

not a direct answer to your question but could you try
PDDocument.loadNonSeq instead?

BR
Maruan Sahyoun


Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com:

Hello guys,


Has anyone had any problem with this? Any idea why it happens? What would
be a good value for pushBackSize so this does not happen? Thanks!


Partial stack trace:


org.apache.pdfbox.exceptions.WrappedIOException: Could not push back

72940

bytes in order to reparse stream. Try increasing push back buffer using
system property org.apache.pdfbox.baseParser.pushBackSize



at


org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)




at
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)



at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)



at
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)



at
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)



at


org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)






--

 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780474
 F: +49 345 4780471
 timo.boe...@ontochem.com

_

 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
_



Re: Problem With MergeUtility

2014-03-13 Thread Maruan Sahyoun
this issue is logged at PDFBOX-1964 with a potential patch attached.


BR 
Maruan Sahyoun

Am 13.03.2014 um 17:52 schrieb Timo Boehme timo.boe...@ontochem.com:

 Hi,
 
 as far as I remember PDFMergeUtility is one of the last utilities not 
 supporting loadNonSeq currently.
 
 As a workaround get the source of PDFMergeUtility, change PDDocument.load to 
 PDDocument.loadNonSeq  (you may provide null as buffer parameter).
 
 
 Best,
 Timo
 
 
 Am 13.03.2014 16:46, schrieb Alin Mazilu:
 Where? Here's the code that causes that:
 
 PDFMergeUtility util = new PDFMergeUtility();
 
 for (File file : set) {
 try{
 if( file.exists() ){
 util.addSource(file);
 }
 } catch ( Exception e ){
//log e
 }
  }
 util.setDestinationFileName(...);
 
 util.mergeDocuments();
 
 
 On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun 
 sahy...@fileaffairs.dewrote:
 
 Hi,
 
 not a direct answer to your question but could you try
 PDDocument.loadNonSeq instead?
 
 BR
 Maruan Sahyoun
 
 Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com:
 
 Hello guys,
 
 
 Has anyone had any problem with this? Any idea why it happens? What would
 be a good value for pushBackSize so this does not happen? Thanks!
 
 
 Partial stack trace:
 
 
 org.apache.pdfbox.exceptions.WrappedIOException: Could not push back
 72940
 bytes in order to reparse stream. Try increasing push back buffer using
 system property org.apache.pdfbox.baseParser.pushBackSize
 
 
 
at
 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)
 
 
 
at
 org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
 
 
 
at
 org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
 
 
 
at
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
 
 
 
at
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
 
 
 
at
 
 org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)
 
 
 
 
 -- 
 
 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780474
 F: +49 345 4780471
 timo.boe...@ontochem.com
 
 _
 
 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
 _
 



Re: Problem With MergeUtility

2014-03-13 Thread Alin Mazilu
Ok, I will try. In my opinion it would be useful if it had the instance
variables protected rather than private, that way the class could be
extended as needed, like PDFTextStripper. It my situation I would only have
to override mergeDocuments(). Anyway, I will try it.

Thank you,

Alin


On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme timo.boe...@ontochem.comwrote:

 Hi,

 as far as I remember PDFMergeUtility is one of the last utilities not
 supporting loadNonSeq currently.

 As a workaround get the source of PDFMergeUtility, change PDDocument.load
 to PDDocument.loadNonSeq  (you may provide null as buffer parameter).


 Best,
 Timo


 Am 13.03.2014 16:46, schrieb Alin Mazilu:

  Where? Here's the code that causes that:

 PDFMergeUtility util = new PDFMergeUtility();

 for (File file : set) {
 try{
 if( file.exists() ){
  util.addSource(file);
 }
  } catch ( Exception e ){
 //log e
  }
   }
 util.setDestinationFileName(...);

 util.mergeDocuments();


 On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.de
 wrote:

  Hi,

 not a direct answer to your question but could you try
 PDDocument.loadNonSeq instead?

 BR
 Maruan Sahyoun

  Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com:

 Hello guys,


 Has anyone had any problem with this? Any idea why it happens? What
 would
 be a good value for pushBackSize so this does not happen? Thanks!


 Partial stack trace:


 org.apache.pdfbox.exceptions.WrappedIOException: Could not push back

 72940

 bytes in order to reparse stream. Try increasing push back buffer using
 system property org.apache.pdfbox.baseParser.pushBackSize



 at

  org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
 BaseParser.java:546)




 at
 org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)



 at
 org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)



 at
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)



 at
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)



 at

  org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(
 PDFMergerUtility.java:186)




 --

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boe...@ontochem.com

 _

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
 _




Re: Problem With MergeUtility

2014-03-13 Thread Tilman Hausherr

Am 13.03.2014 17:58, schrieb Maruan Sahyoun:

this issue is logged at PDFBOX-1964 with a potential patch attached.



Reviewed and committed :-)

Tilman




Re: Problem With MergeUtility

2014-03-13 Thread John Hewson
Hi Alin

Thanks for your fix.

  it would be useful if it had the instance
 variables protected rather than private, that way the class could be
 extended as needed, like PDFTextStripper.

The problem with making fields protected is that it exposes internal 
implementation details,
making them part of the public API. This prevents us from making internal 
changes in the
future without introducing breaking changes to the public API.

In the case of PDFTextStripper, there is a strong use case for using a 
protected field,
because overriding it is the primary mechanism for custom text extraction.

Cheers

-- John

On 13 Mar 2014, at 10:40, Alin Mazilu impet...@gmail.com wrote:

 Ok, I will try. In my opinion it would be useful if it had the instance
 variables protected rather than private, that way the class could be
 extended as needed, like PDFTextStripper. It my situation I would only have
 to override mergeDocuments(). Anyway, I will try it.
 
 Thank you,
 
 Alin
 
 
 On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme timo.boe...@ontochem.comwrote:
 
 Hi,
 
 as far as I remember PDFMergeUtility is one of the last utilities not
 supporting loadNonSeq currently.
 
 As a workaround get the source of PDFMergeUtility, change PDDocument.load
 to PDDocument.loadNonSeq  (you may provide null as buffer parameter).
 
 
 Best,
 Timo
 
 
 Am 13.03.2014 16:46, schrieb Alin Mazilu:
 
 Where? Here's the code that causes that:
 
 PDFMergeUtility util = new PDFMergeUtility();
 
 for (File file : set) {
 try{
 if( file.exists() ){
 util.addSource(file);
 }
 } catch ( Exception e ){
//log e
 }
  }
 util.setDestinationFileName(...);
 
 util.mergeDocuments();
 
 
 On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.de
 wrote:
 
 Hi,
 
 not a direct answer to your question but could you try
 PDDocument.loadNonSeq instead?
 
 BR
 Maruan Sahyoun
 
 Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com:
 
 Hello guys,
 
 
 Has anyone had any problem with this? Any idea why it happens? What
 would
 be a good value for pushBackSize so this does not happen? Thanks!
 
 
 Partial stack trace:
 
 
 org.apache.pdfbox.exceptions.WrappedIOException: Could not push back
 
 72940
 
 bytes in order to reparse stream. Try increasing push back buffer using
 system property org.apache.pdfbox.baseParser.pushBackSize
 
 
 
at
 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
 BaseParser.java:546)
 
 
 
 
at
 org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
 
 
 
at
 org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
 
 
 
at
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
 
 
 
at
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
 
 
 
at
 
 org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(
 PDFMergerUtility.java:186)
 
 
 
 
 --
 
 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780474
 F: +49 345 4780471
 timo.boe...@ontochem.com
 
 _
 
 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
 _
 
 



Re: Problem With MergeUtility

2014-03-13 Thread Alin Mazilu
I know that. No problem.


On Thu, Mar 13, 2014 at 2:23 PM, John Hewson j...@jahewson.com wrote:

 Hi Alin

 Thanks for your fix.

   it would be useful if it had the instance
  variables protected rather than private, that way the class could be
  extended as needed, like PDFTextStripper.

 The problem with making fields protected is that it exposes internal
 implementation details,
 making them part of the public API. This prevents us from making internal
 changes in the
 future without introducing breaking changes to the public API.

 In the case of PDFTextStripper, there is a strong use case for using a
 protected field,
 because overriding it is the primary mechanism for custom text extraction.

 Cheers

 -- John

 On 13 Mar 2014, at 10:40, Alin Mazilu impet...@gmail.com wrote:

  Ok, I will try. In my opinion it would be useful if it had the instance
  variables protected rather than private, that way the class could be
  extended as needed, like PDFTextStripper. It my situation I would only
 have
  to override mergeDocuments(). Anyway, I will try it.
 
  Thank you,
 
  Alin
 
 
  On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme timo.boe...@ontochem.com
 wrote:
 
  Hi,
 
  as far as I remember PDFMergeUtility is one of the last utilities not
  supporting loadNonSeq currently.
 
  As a workaround get the source of PDFMergeUtility, change
 PDDocument.load
  to PDDocument.loadNonSeq  (you may provide null as buffer parameter).
 
 
  Best,
  Timo
 
 
  Am 13.03.2014 16:46, schrieb Alin Mazilu:
 
  Where? Here's the code that causes that:
 
  PDFMergeUtility util = new PDFMergeUtility();
 
  for (File file : set) {
  try{
  if( file.exists() ){
  util.addSource(file);
  }
  } catch ( Exception e ){
 //log e
  }
   }
  util.setDestinationFileName(...);
 
  util.mergeDocuments();
 
 
  On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun 
 sahy...@fileaffairs.de
  wrote:
 
  Hi,
 
  not a direct answer to your question but could you try
  PDDocument.loadNonSeq instead?
 
  BR
  Maruan Sahyoun
 
  Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com:
 
  Hello guys,
 
 
  Has anyone had any problem with this? Any idea why it happens? What
  would
  be a good value for pushBackSize so this does not happen? Thanks!
 
 
  Partial stack trace:
 
 
  org.apache.pdfbox.exceptions.WrappedIOException: Could not push back
 
  72940
 
  bytes in order to reparse stream. Try increasing push back buffer
 using
  system property org.apache.pdfbox.baseParser.pushBackSize
 
 
 
 at
 
  org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
  BaseParser.java:546)
 
 
 
 
 at
  org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
 
 
 
 at
  org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
 
 
 
 at
  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
 
 
 
 at
  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
 
 
 
 at
 
  org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(
  PDFMergerUtility.java:186)
 
 
 
 
  --
 
  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boe...@ontochem.com
 
  _
 
  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
  _