Am 21.04.2011 02:31, schrieb [email protected]:
It'll be faster, but I'm not so certain it'll be more reliable. For example, I know the xref section can be missing or completely inaccurate and Adobe Reader will still open it as if nothing is wrong. So either Adobe Reader is not a conforming reader, or it has a huge amount of code dedicated to detecting and recovering from non-conforming PDFs. Either way, it ignores the xref table at least some of the time (and perhaps all of the time).
Yeah i also think so. But we need to do our best to find a way parsing as much as possible and do not break the parser.
I think the only way this will reduce parsing errors is if you're not accessing the part of the document which is non-conforming. For example, if page 5 is corrupt/non-conforming in a 10 page PDF, and you only read the first page, you'd avoid the error. On the other hand, if you process every page, you'll still run in to it and PDFBox may be able to auto-recover, or it might throw an exception.
you are right. i never thought so far. we should try some pdf documents from your test pool and see what happen.
At any rate, I'll try to get the what I have out there either later tonight or tomorrow night.
This will be nice, i will take a look at the code and test it or try to implement new features or improvments.
Thanks, Adam
It's good to see that some people are exerted to make the pdfbox better. :) BR Thomas
