Re: [jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2014-02-08 Thread Steve Tyler
Hi John,

As much as I applaud  your efforts on this project do you have any
idea how I can stop receiving the email updates?

Steve Tyler

Chief Information Officer

UK: +44 (0) 7917 005990
USA: +1 312 239 0593

Email: steve.ty...@episys.com
Web:  www.episys.com


 On 8 Feb 2014, at 19:33, John Hewson (JIRA) j...@apache.org wrote:


[ 
 https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895704#comment-13895704
  ]

 John Hewson commented on PDFBOX-1498:
 -

 There's no way you're going to be open a 700MB PDF file with only 1024MB of 
 heap. I think that is probably the cause of your problem. Try using at least 
 4x more heap space.

 Index Out Of Bounds Exception while reading large PDF Document
 ---

Key: PDFBOX-1498
URL: https://issues.apache.org/jira/browse/PDFBOX-1498
Project: PDFBox
 Issue Type: Bug
   Reporter: Manoj Patel
   Assignee: Andreas Lehmkühler

 I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
 document (800 mb).
 Below is the full stack
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
at imageData.AddFooter.main(AddFooter.java:26)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
at 
 org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
... 3 more



 --
 This message was sent by Atlassian JIRA
 (v6.1.5#6160)

-- 


This e-mail is only intended for the person(s) to whom it is addressed as 
it may contain confidential information.

For more information please visit www.episys.com/disclaimer.htm


Re: [jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2014-02-08 Thread Andreas Lehmkuehler

Hi,

Am 08.02.2014 20:36, schrieb Steve Tyler:

Hi John,

As much as I applaud  your efforts on this project do you have any
idea how I can stop receiving the email updates?

You have to unsubscribe, see [1] for further details


BR
Andreas Lehmkühler

[1] http://pdfbox.apache.org/mailinglists.html


Steve Tyler

Chief Information Officer

UK: +44 (0) 7917 005990
USA: +1 312 239 0593

Email: steve.ty...@episys.com
Web:  www.episys.com



On 8 Feb 2014, at 19:33, John Hewson (JIRA) j...@apache.org wrote:


[ 
https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895704#comment-13895704
 ]

John Hewson commented on PDFBOX-1498:
-

There's no way you're going to be open a 700MB PDF file with only 1024MB of 
heap. I think that is probably the cause of your problem. Try using at least 4x 
more heap space.


Index Out Of Bounds Exception while reading large PDF Document
---

Key: PDFBOX-1498
URL: https://issues.apache.org/jira/browse/PDFBOX-1498
Project: PDFBox
 Issue Type: Bug
   Reporter: Manoj Patel
   Assignee: Andreas Lehmkühler

I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
document (800 mb).
Below is the full stack
Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
at imageData.AddFooter.main(AddFooter.java:26)
Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
at 
org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
... 3 more




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)






Re: [jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2014-02-08 Thread John Hewson
I guess you could filter out any email containing [jira], I'm not aware of any 
other method.

Perhaps we need a separate mailing list for JIRA?

-- John

 On 8 Feb 2014, at 11:36, Steve Tyler steve.ty...@episys.com wrote:
 
 Hi John,
 
 As much as I applaud  your efforts on this project do you have any
 idea how I can stop receiving the email updates?
 
 Steve Tyler
 
 Chief Information Officer
 
 UK: +44 (0) 7917 005990
 USA: +1 312 239 0593
 
 Email: steve.ty...@episys.com
 Web:  www.episys.com
 
 
 On 8 Feb 2014, at 19:33, John Hewson (JIRA) j...@apache.org wrote:
 
 
   [ 
 https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895704#comment-13895704
  ]
 
 John Hewson commented on PDFBOX-1498:
 -
 
 There's no way you're going to be open a 700MB PDF file with only 1024MB of 
 heap. I think that is probably the cause of your problem. Try using at least 
 4x more heap space.
 
 Index Out Of Bounds Exception while reading large PDF Document
 ---
 
   Key: PDFBOX-1498
   URL: https://issues.apache.org/jira/browse/PDFBOX-1498
   Project: PDFBox
Issue Type: Bug
  Reporter: Manoj Patel
  Assignee: Andreas Lehmkühler
 
 I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
 document (800 mb).
 Below is the full stack
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
   at imageData.AddFooter.main(AddFooter.java:26)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
   at 
 org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
   at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
   at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
   at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
   at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
   ... 3 more
 
 
 
 --
 This message was sent by Atlassian JIRA
 (v6.1.5#6160)
 
 -- 
 
 
 This e-mail is only intended for the person(s) to whom it is addressed as 
 it may contain confidential information.
 
 For more information please visit www.episys.com/disclaimer.htm


Re: [jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2013-01-24 Thread Andreas Lehmkuehler

Hi,

Am 23.01.2013 10:27, schrieb Maruan Sahyoun:

Hi Manoj,

I'm afraid Manoj isn't subscribed to this list.

BR
Andreas Lehmkühler


the size alone is not the cause of the issue. In a recent project we were 
handling PDF's larger than the one you are talking about.

1. Can you test with the Non Sequential Parser i.e. PDDocument.loadNonSeq(…) 
and confirm that this is causing the same issue.
2. Can you upload a sample PDF which enables us to reproduce the issue? Without 
that it will be very difficult to say why this is happening.
3. Of course you can try with larger heap settings until it works but I don't 
think this is a good approach.

In addition to that it would be good if you could describe what you want to 
achieve with the PDF. Maybe there are ways doing so without parsing the 
complete file.

With kind regards

Maruan Sahyoun


Am 23.01.2013 um 10:18 schrieb Manoj Patel (JIRA) j...@apache.org:



[ 
https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560504#comment-13560504
 ]

Manoj Patel commented on PDFBOX-1498:
-

Sorry but i cannot share document with anyone. I have created new document 
which is around 700mb. Now when i try  same program it is giving below Java 
heap space exception, even i have set -Xmx1024 parameter for that

Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
at imageData.ReadLargeFile.main(ReadLargeFile.java:13)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedOutputStream.init(BufferedOutputStream.java:59)
at 
org.apache.pdfbox.cos.COSStream.createFilteredStream(COSStream.java:415)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:452)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
... 3 more

Is there any way to read it?


Index Out Of Bounds Exception while reading large PDF Document
---

Key: PDFBOX-1498
URL: https://issues.apache.org/jira/browse/PDFBOX-1498
Project: PDFBox
 Issue Type: Bug
   Reporter: Manoj Patel
   Assignee: Andreas Lehmkühler

I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
document (800 mb).
Below is the full stack
Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
at imageData.AddFooter.main(AddFooter.java:26)
Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at 
org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
at 
org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
... 3 more


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira







[jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2013-01-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561817#comment-13561817
 ] 

Andreas Lehmkühler commented on PDFBOX-1498:


Try the  the new Non Sequential Parser iby using PDDocument.loadNonSeq(…)  
instead of PDDocument.load(...)

 Index Out Of Bounds Exception while reading large PDF Document 
 ---

 Key: PDFBOX-1498
 URL: https://issues.apache.org/jira/browse/PDFBOX-1498
 Project: PDFBox
  Issue Type: Bug
Reporter: Manoj Patel
Assignee: Andreas Lehmkühler

 I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
 document (800 mb). 
 Below is the full stack
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
   at imageData.AddFooter.main(AddFooter.java:26)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
   at 
 org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
   at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
   at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
   at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
   ... 3 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2013-01-23 Thread Manoj Patel (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560504#comment-13560504
 ] 

Manoj Patel commented on PDFBOX-1498:
-

Sorry but i cannot share document with anyone. I have created new document 
which is around 700mb. Now when i try  same program it is giving below Java 
heap space exception, even i have set -Xmx1024 parameter for that

Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
at imageData.ReadLargeFile.main(ReadLargeFile.java:13)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedOutputStream.init(BufferedOutputStream.java:59)
at 
org.apache.pdfbox.cos.COSStream.createFilteredStream(COSStream.java:415)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:452)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
... 3 more

Is there any way to read it?

 Index Out Of Bounds Exception while reading large PDF Document 
 ---

 Key: PDFBOX-1498
 URL: https://issues.apache.org/jira/browse/PDFBOX-1498
 Project: PDFBox
  Issue Type: Bug
Reporter: Manoj Patel
Assignee: Andreas Lehmkühler

 I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
 document (800 mb). 
 Below is the full stack
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
   at imageData.AddFooter.main(AddFooter.java:26)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
   at 
 org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
   at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
   at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
   at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
   ... 3 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2013-01-23 Thread Maruan Sahyoun
Hi Manoj,

the size alone is not the cause of the issue. In a recent project we were 
handling PDF's larger than the one you are talking about.

1. Can you test with the Non Sequential Parser i.e. PDDocument.loadNonSeq(…) 
and confirm that this is causing the same issue.
2. Can you upload a sample PDF which enables us to reproduce the issue? Without 
that it will be very difficult to say why this is happening.
3. Of course you can try with larger heap settings until it works but I don't 
think this is a good approach.

In addition to that it would be good if you could describe what you want to 
achieve with the PDF. Maybe there are ways doing so without parsing the 
complete file.

With kind regards

Maruan Sahyoun


Am 23.01.2013 um 10:18 schrieb Manoj Patel (JIRA) j...@apache.org:

 
[ 
 https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560504#comment-13560504
  ] 
 
 Manoj Patel commented on PDFBOX-1498:
 -
 
 Sorry but i cannot share document with anyone. I have created new document 
 which is around 700mb. Now when i try  same program it is giving below Java 
 heap space exception, even i have set -Xmx1024 parameter for that
 
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
   at imageData.ReadLargeFile.main(ReadLargeFile.java:13)
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at java.io.BufferedOutputStream.init(BufferedOutputStream.java:59)
   at 
 org.apache.pdfbox.cos.COSStream.createFilteredStream(COSStream.java:415)
   at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:452)
   at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
   ... 3 more
 
 Is there any way to read it?
 
 Index Out Of Bounds Exception while reading large PDF Document 
 ---
 
Key: PDFBOX-1498
URL: https://issues.apache.org/jira/browse/PDFBOX-1498
Project: PDFBox
 Issue Type: Bug
   Reporter: Manoj Patel
   Assignee: Andreas Lehmkühler
 
 I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
 document (800 mb). 
 Below is the full stack
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
  at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
  at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
  at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
  at imageData.AddFooter.main(AddFooter.java:26)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
  at java.util.ArrayList.RangeCheck(ArrayList.java:547)
  at java.util.ArrayList.get(ArrayList.java:322)
  at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
  at 
 org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
  at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
  at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
  at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
  at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
  at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
  at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
  ... 3 more
 
 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA administrators
 For more information on JIRA, see: http://www.atlassian.com/software/jira



[jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2013-01-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559514#comment-13559514
 ] 

Andreas Lehmkühler commented on PDFBOX-1498:


The described IndexOutOfBounds issue isn't related to the size of the pdf (see 
PDFBOX-1490, the pdf in question has a size of 102Kb)

Are you sure that 

- you have the latest code?
- you are really using the new compiled code?
- the issue/stacktrace is the same?

 Index Out Of Bounds Exception while reading large PDF Document 
 ---

 Key: PDFBOX-1498
 URL: https://issues.apache.org/jira/browse/PDFBOX-1498
 Project: PDFBox
  Issue Type: Bug
Reporter: Manoj Patel
Assignee: Andreas Lehmkühler

 I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
 document (800 mb). 
 Below is the full stack
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
   at imageData.AddFooter.main(AddFooter.java:26)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
   at 
 org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
   at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
   at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
   at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
   ... 3 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2013-01-22 Thread Manoj Patel (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559536#comment-13559536
 ] 

Manoj Patel commented on PDFBOX-1498:
-

Ya, I am using fontbox-1.8.0-SNAPSHOT.jar, jempbox-1.8.0-SNAPSHOT.jar, 
pdfbox-1.8.0-SNAPSHOT.jar and still i am getting below error, even i have 
checkout latest code and build latest jars few minutes before. Below is the 
stack which i am getting

Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
at imageData.ReadLargeFile.main(ReadLargeFile.java:13)
Caused by: java.lang.IndexOutOfBoundsException: Index: 3376, Size: 3376
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at 
org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
at 
org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
... 3 more

 Index Out Of Bounds Exception while reading large PDF Document 
 ---

 Key: PDFBOX-1498
 URL: https://issues.apache.org/jira/browse/PDFBOX-1498
 Project: PDFBox
  Issue Type: Bug
Reporter: Manoj Patel
Assignee: Andreas Lehmkühler

 I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
 document (800 mb). 
 Below is the full stack
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
   at imageData.AddFooter.main(AddFooter.java:26)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
   at 
 org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
   at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
   at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
   at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
   ... 3 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2013-01-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559557#comment-13559557
 ] 

Andreas Lehmkühler commented on PDFBOX-1498:


Hmmm, there are two possible ways to proceed:

- upload the pdf in question somewhere to a sharehoster, so that we can 
download it (Send me a private mail with the download link if you can't make 
the pdf public)
- attach the pdfbox.jar you're using so that we can doublecheck your environment

 Index Out Of Bounds Exception while reading large PDF Document 
 ---

 Key: PDFBOX-1498
 URL: https://issues.apache.org/jira/browse/PDFBOX-1498
 Project: PDFBox
  Issue Type: Bug
Reporter: Manoj Patel
Assignee: Andreas Lehmkühler

 I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
 document (800 mb). 
 Below is the full stack
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
   at imageData.AddFooter.main(AddFooter.java:26)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
   at 
 org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
   at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
   at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
   at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
   ... 3 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2013-01-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559429#comment-13559429
 ] 

Andreas Lehmkühler commented on PDFBOX-1498:


Please attach a sample pdf

 Index Out Of Bounds Exception while reading large PDF Document 
 ---

 Key: PDFBOX-1498
 URL: https://issues.apache.org/jira/browse/PDFBOX-1498
 Project: PDFBox
  Issue Type: Bug
Reporter: Manoj Patel
Assignee: Andreas Lehmkühler

 I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
 document (800 mb). 
 Below is the full stack
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
   at imageData.AddFooter.main(AddFooter.java:26)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
   at 
 org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
   at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
   at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
   at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
   ... 3 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1498) Index Out Of Bounds Exception while reading large PDF Document

2013-01-21 Thread Manoj Patel (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559441#comment-13559441
 ] 

Manoj Patel commented on PDFBOX-1498:
-

Its around 800 mb size document. You can try any pdf file with same size.

 Index Out Of Bounds Exception while reading large PDF Document 
 ---

 Key: PDFBOX-1498
 URL: https://issues.apache.org/jira/browse/PDFBOX-1498
 Project: PDFBox
  Issue Type: Bug
Reporter: Manoj Patel
Assignee: Andreas Lehmkühler

 I am getting java.lang.IndexOutOfBoundsException while reading large PDF 
 document (800 mb). 
 Below is the full stack
 Exception in thread main org.apache.pdfbox.exceptions.WrappedIOException
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:243)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
   at imageData.AddFooter.main(AddFooter.java:26)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 3377, Size: 3377
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
   at 
 org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
   at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
   at 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:606)
   at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
   ... 3 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira