Thanks Tim, I am testing 2.8.0 with StormCrawler
Apart from a lot of warning about missing classes like *Caused by: java.lang.ClassNotFoundException: org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream * I am also getting a failed test when trying to extract text from an embedded document. I can't see anything related in the release notes apart maybe from * Improve extraction of embedded file names in .docx (TIKA-3968). I've created a branch for it in SC -> https://github.com/DigitalPebble/storm-crawler/tree/tika2.8 in case anyone has the time and inclination to try to reproduce the issue. I'll see if I can find the source of the problem Julien On Tue, 9 May 2023 at 17:40, Tim Allison <talli...@apache.org> wrote: > A candidate for the Tika 2.8.0 release is available at: > https://dist.apache.org/repos/dist/dev/tika/2.8.0 > > The release candidate is a zip archive of the sources in: > https://github.com/apache/tika/tree/2.8.0-rc1/ > > The SHA-512 checksum of the archive is > > 6b514a45b87013c566e57af2b6a526bce0b3bf02a1dabefe998068aa49672ec4a7ec2ecfa538a84aca719607f339a44341caeaab1ca313fc1c161154ec095bbb. > > In addition, a staged maven repository is available here: > > https://repository.apache.org/content/repositories/orgapachetika-1093/org/apache/tika > > Please vote on releasing this package as Apache Tika 2.8.0. > The vote is open for the next 72 hours and passes if a majority of at > least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 2.8.0 > [ ] -1 Do not release this package because... > > Here's my +1. > > Best, > > Tim > -- *Open Source Solutions for Text Engineering* http://www.digitalpebble.com http://digitalpebble.blogspot.com/ #digitalpebble <http://twitter.com/digitalpebble>