Hi,
Am 24.03.2015 um 15:15 schrieb Allison, Timothy B.:
All,
Apologies for my delay, y, +1 to release.... no major discernible problematic
differences between 1.8.8 and 1.8.9 according to my analysis with the govdocs1
corpus.
Many thanks to Tilman and Maruan for identifying and fixing PDFBOX-2710.
Andreas,
I'm sorry for missing your email on the pdfbox list.
No need to apologize. Thanks for your help!
Instead of a table for the run on 20150316, I included a zip of reports[1].
This is a static dump of reports. Open the index.html file and you can
navigate through different features of the comparisons. Rather than the full
table of differences, this represents various sql calls against the comparisons
table. This is a very early/prototype/dev version of what I'd like to turn
into a more interactive gui for evaluating text/metadata extraction differences
between two batch runs...actual committing into Tika is probably many months
away. Any and all feedback and collaboration are welcome!
There is a minor glitch. The files referenced in section "Content differences
identified by heuristic threshold" don't have a capital "A" and "B" in their
filename so that the links dont' work, at least on linux.
Is there any chance to list those files producing the listed exceptions, so that
we are able to reproduce them?
Best,
Tim
[1]
https://issues.apache.org/jira/secure/attachment/12704949/PDFBox_1_8_8Vs1_8_9_20150316.zip
-----Original Message-----
From: Andreas Lehmkühler [mailto:andr...@lehmi.de]
Sent: Monday, March 23, 2015 6:49 AM
To: dev@pdfbox.apache.org
Subject: Re: PDFBox 1.8.9 release
SNIP
BR
Andreas Lehmkühler
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org