Rat checked out, successful build on linux. +1... with one reservation
I just ran a fresh update of trunk from Tika with RC for POI 3.11 Beta 1 against a random selection of ~10k files from govdocs1, covering many formats. There aren't many office-x files, but there are some, and I made sure to include every one in the govdocs1 corpus within the ~10k files. When comparing with Tika 1.5: 1) There are no new exceptions 2) There are 15 fewer exceptions (some pdf, but mostly POI) The regression I reported on the Tika dev list (http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx) is really in fact fixed by POI 3.11 Beta1. When I manually compared files with < 90% token overlap, I found improvements in POI's handling of rounding and that the newer version of POI is no longer incorrectly adding a "_" to some numbers in an xls file. I found one regression in the handling of an xlsx file: http://digitalcorpora.org/corp/nps/files/govdocs1/598/598948.xlsx Tika 1.6 w/ POI 3.11 Beta 1 is not extracting the comments in this file, whereas Tika 1.5 (and Tika 1.6 w/ POI 3.10-Final) did extract the comments. This suggests that the issue is with POI, but I haven't had a chance to dig in, and unfortunately, I don't think I will have a chance until Monday. Best, Tim -----Original Message----- From: Nick Burch [mailto:[email protected]] Sent: Friday, August 01, 2014 5:33 AM To: [email protected] Subject: [VOTE] Release Apache POI 3.11 Beta 1 Hi All It has been almost half a ear since our last release, so as previously discussed it seems time for another beta. The release candidate for this release is available from: https://dist.apache.org/repos/dist/dev/poi/3.11-beta1-RC1/ And the tag in SVN from which it was built is: https://svn.apache.org/repos/asf/poi/tags/REL_3_11_BETA1 As with all Apache release votes, please check that not only does the code work, and no major breakages have occurred since the last release, but also that packaging is correct, license headers and notices exist etc. The vote will be open for 72 hours, until the end of Sunday 3rd August. (It's a slightly shorter vote than normal, as Apache Tika is waiting on a bug fix in the release before they roll Tika 1.6!) The vote options are: +1 - I support this release 0 - I don't object to this release, but I haven't checked it -1 - There's a problem with the release, and that is .... Votes are welcomed (and encouraged) from everyone, committer or not! Thanks Nick --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
