Am 02.10.2017 um 23:48 schrieb Allison, Timothy B.:
Re 308576.pdf: the text extraction has a huge loss, but a manual check shows it 
is identical. However that file has the NPE from PDActionURI.getURI(), could it 
be that this results in an abort of text extraction?
Same for 569017.pdf.

Likely.  There are two "per file pair contents" files.  The one ending with 
"_ignore_exceptions.xlsx" means that results are not reported if there was an exception caught for one of the 
files (308576.pdf and 569017.pdf aren't in that file).  The other one "*_with_exceptions" includes both.  
Based on your feedback, I should add 2 boolean cols to "*_with_exceptions.xlsx" for exceptionInA and 
exceptionInB?

Sorry, I had forgotten that. Yes, the two columns would be useful.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to