All, Apologies for my delay, y, +1 to release.... no major discernible problematic differences between 1.8.8 and 1.8.9 according to my analysis with the govdocs1 corpus.
Many thanks to Tilman and Maruan for identifying and fixing PDFBOX-2710. Andreas, I'm sorry for missing your email on the pdfbox list. Instead of a table for the run on 20150316, I included a zip of reports[1]. This is a static dump of reports. Open the index.html file and you can navigate through different features of the comparisons. Rather than the full table of differences, this represents various sql calls against the comparisons table. This is a very early/prototype/dev version of what I'd like to turn into a more interactive gui for evaluating text/metadata extraction differences between two batch runs...actual committing into Tika is probably many months away. Any and all feedback and collaboration are welcome! Best, Tim [1] https://issues.apache.org/jira/secure/attachment/12704949/PDFBox_1_8_8Vs1_8_9_20150316.zip -----Original Message----- From: Andreas Lehmkühler [mailto:andr...@lehmi.de] Sent: Monday, March 23, 2015 6:49 AM To: dev@pdfbox.apache.org Subject: Re: PDFBox 1.8.9 release > Andreas Lehmkühler <andr...@lehmi.de> hat am 23. März 2015 um 09:03 > geschrieben: > > > Hi, > > > Tilman Hausherr <thaush...@t-online.de> hat am 22. März 2015 um 22:46 > > geschrieben: > > > > > > Am 22.03.2015 um 15:53 schrieb Maruan Sahyoun: > > >> Am 22.03.2015 um 14:55 schrieb Tilman Hausherr <thaush...@t-online.de>: > > >> > > >> He already did... I need to have another look (hopefully tonight), and > > >> there was also some dialog between Maruan and Tim about acroforms and I'm > > >> not sure what the result is - whether it is OK or whether something needs > > >> to be done. > > > the result is fine with 1.8.9 - the diff compared to the 1.8.8 output is > > > because the 1.8.8 output was wrong as the same content was repeated > > > multiple > > > times although there was only one field. > > > > > > {quote} > > > Therefore, we have another improvement with 1.8.9. > > > {quote} > > > > Yeah, I did read that, but then the dialog went on... oh well. So the > > only difference that remains now are weird differences depending on > > wether Tim uses single-thread or multi-thread. And as I said, I 'm > > unable to investigate that. I'm satisfied that the texts are identical > > in my tests. > I've waited for some nice result sheet as we got last time, so that I thought > the test isn't finished yet. However, we don't need a fancy report, it was > just > a wrong expection of mine. To sum it up we are all good here and I'm going to I've just found the conversation in TIKA-1575, it seems that everything is ok. :-) BR Andreas > cut the release tomorrow evening round about 36 hours from now if nobody > objects. > > BR > Andreas > > > > > Tilman > > > > > > > > Maruan > > > > > >> Tilman > > >> > > >> Am 22.03.2015 um 14:38 schrieb Andreas Lehmkuehler: > > >>> Hi, > > >>> > > >>> Am 12.03.2015 um 18:57 schrieb Allison, Timothy B.: > > >>>> Thank you, Tilman, for pinging me on this. I should have results by > > >>>> tomorrow. > > >>> @Tim, just a friednly reminder, any updates on the test results? > > >>> > > >>> BR > > >>> Andreas > > >>> > > >>>> Best, > > >>>> > > >>>> Tim > > >>>> > > >>>> -----Original Message----- > > >>>> From: Tilman Hausherr [mailto:thaush...@t-online.de] > > >>>> Sent: Thursday, March 12, 2015 1:39 PM > > >>>> To: dev@pdfbox.apache.org > > >>>> Subject: Re: PDFBox 1.8.9 release > > >>>> > > >>>> +1 > > >>>> > > >>>> I'll ask Tim Allison to run his mass tests. > > >>>> > > >>>> Tilman > > >>>> > > >>>> Am 11.03.2015 um 12:12 schrieb Andreas Lehmkühler: > > >>>>> Hi, > > >>>>> > > >>>>> there are again a number of solved issues and I'm thinking about a new > > >>>>> bugfix release. How about a new one next week, maybe later if someone > > >>>>> wants to get some addtional things done before? > > >>>>> > > >>>>> WDYT? > > >>>>> > > >>>>> BR > > >>>>> Andreas Lehmkühler > > >>>>> > > >>>>> --------------------------------------------------------------------- > > >>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > > >>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org > > >>>>> > > >>>> > > >>>> --------------------------------------------------------------------- > > >>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > > >>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org > > >>>> > > >>>> > > >>>> --------------------------------------------------------------------- > > >>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > > >>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org > > >>>> > > >>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > > >>> For additional commands, e-mail: dev-h...@pdfbox.apache.org > > >>> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > > >> For additional commands, e-mail: dev-h...@pdfbox.apache.org > > >> > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > > For additional commands, e-mail: dev-h...@pdfbox.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: dev-h...@pdfbox.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org