Am 21.04.2011 02:31, schrieb [email protected]:
It'll be faster, but I'm not so certain it'll be more reliable.  For
example, I know the xref section can be missing or completely inaccurate
and Adobe Reader will still open it as if nothing is wrong.  So either
Adobe Reader is not a conforming reader, or it has a huge amount of code
dedicated to detecting and recovering from non-conforming PDFs.  Either
way, it ignores the xref table at least some of the time (and perhaps all
of the time).

Yeah i also think so. But we need to do our best to find a way parsing as much as possible and do not break the parser.

I think the only way this will reduce parsing errors is if you're not
accessing the part of the document which is non-conforming.  For example,
if page 5 is corrupt/non-conforming in a 10 page PDF, and you only read
the first page, you'd avoid the error.  On the other hand, if you process
every page, you'll still run in to it and PDFBox may be able to
auto-recover, or it might throw an exception.

you are right. i never thought so far. we should try some pdf documents from your test pool and see what happen.


At any rate, I'll try to get the what I have out there either later
tonight or tomorrow night.

This will be nice, i will take a look at the code and test it or try to implement new features or improvments.

Thanks,
Adam

It's good to see that some people are exerted to make the pdfbox better. :)

BR
Thomas

Reply via email to