Re: RandomAccessFile for PDFBox

Thomas Chojecki Thu, 21 Apr 2011 12:18:57 -0700

Am 21.04.2011 02:31, schrieb [email protected]:

It'll be faster, but I'm not so certain it'll be more reliable.  For
example, I know the xref section can be missing or completely inaccurate
and Adobe Reader will still open it as if nothing is wrong.  So either
Adobe Reader is not a conforming reader, or it has a huge amount of code
dedicated to detecting and recovering from non-conforming PDFs.  Either
way, it ignores the xref table at least some of the time (and perhaps all
of the time).

Yeah i also think so. But we need to do our best to find a way parsingas much as possible and do not break the parser.

I think the only way this will reduce parsing errors is if you're not
accessing the part of the document which is non-conforming.  For example,
if page 5 is corrupt/non-conforming in a 10 page PDF, and you only read
the first page, you'd avoid the error.  On the other hand, if you process
every page, you'll still run in to it and PDFBox may be able to
auto-recover, or it might throw an exception.

you are right. i never thought so far. we should try some pdf documentsfrom your test pool and see what happen.


At any rate, I'll try to get the what I have out there either later
tonight or tomorrow night.

This will be nice, i will take a look at the code and test it or try toimplement new features or improvments.

Thanks,
Adam


It's good to see that some people are exerted to make the pdfbox better. :)

BR
Thomas

Re: RandomAccessFile for PDFBox

Reply via email to