What does your mvn dependency:tree tell? :-) The only thing that needs to be cleaned is the locally installed SC.
Am 11. September 2025 16:48:53 MESZ schrieb Markos Volikas <[email protected]>: >Yes.. > >I'm building from source using: >https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC2/ >(tar.gz) > >I completely removed >/home/markos/.m2/repository/org/apache/commons/commons-compress and then ran >mvn clean install and it seems that multiple versions are getting in. > >Before this I had also removed my .m2/ completely to make sure all >dependencies are downloaded and they did. I have attached the build log. > >markos@nombat:~/.m2/repository/org/apache/commons/commons-compress$ ll >total 28 >drwxrwxr-x 7 markos markos 4096 Sep 11 17:42 ./ >drwxrwxr-x 12 markos markos 4096 Sep 11 17:42 ../ >drwxrwxr-x 2 markos markos 4096 Sep 11 17:42 1.20/ >drwxrwxr-x 2 markos markos 4096 Sep 11 17:42 1.26.1/ >drwxrwxr-x 2 markos markos 4096 Sep 11 17:42 1.26.2/ >drwxrwxr-x 2 markos markos 4096 Sep 11 17:42 1.27.1/ >drwxrwxr-x 2 markos markos 4096 Sep 11 17:42 1.28.0/ > >Markos > >On 9/11/25 16:55, Richard Zowalla wrote: >> Cleaned your local Maven repo before building the uber jar? >> >> Can you check your compress version? >> >> Gruß >> Richard >> >> Am 11. September 2025 15:38:38 MESZ schrieb Markos Volikas >> <[email protected]>: >>> Hi all, >>> >>> I'm afraid I'm still getting: >>> >>> 16:25:13.829 [Thread-46-parse-executor[6, 6]] INFO o.a.s.b.JSoupParserBolt >>> - Parsing : starting https://apache.org/ >>> 16:25:13.848 [Thread-46-parse-executor[6, 6]] ERROR o.a.s.b.JSoupParserBolt >>> - Exception while guessing mimetype on https://apache.org/: >>> org.apache.commons.compress.archivers.ArchiveException: No Archiver found >>> for the stream signature >>> >>> I'm running in local mode with Storm 2.8.2 running on Ubuntu 24.04 (openjdk >>> 17.0.16 2025-07-15). The database is Solr running in Docker although this >>> should be irrelevant. Maybe I'm doing something wrong? I have attached the >>> config I'm using in case you have any ideas. Sorry for the delay, but I >>> just found time to look into this again :-( >>> >>> Markos >>> >>> On 9/8/25 20:46, Richard Zowalla wrote: >>>> Hi folks, >>>> >>>> I have posted a 2nd release candidate for the Apache StormCrawler 3.5.0 >>>> release and it is ready for testing. The regression with Tika / Compress >>>> was fixed. >>>> >>>> Apache StormCrawler 3.5.0 decouples Selenium from the core module, >>>> improving modularity and reducing unnecessary dependencies. >>>> The release also introduces an advanced metadata filtering systemt hat >>>> supports complex logical operations like key=>val OR (key2=>val2 AND >>>> key3=>val3). >>>> Additionally, multiple dependencies were upgraded, core tests improved, >>>> and deprecated code cleaned up, enhancing overall stability and >>>> maintainability. >>>> >>>> Thank you to everyone who contributed to this release, including all of >>>> our users and the people who submitted bug reports, >>>> contributed code or documentation enhancements. >>>> >>>> The release was made using the Apache StormCrawler release process, >>>> documented here: >>>> https://github.com/apache/stormcrawler/blob/main/RELEASING.md >>>> >>>> Source: >>>> >>>> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC >>>> <https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC1>2 >>>> >>>> Tag: >>>> >>>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0 >>>> >>>> Commit Hash: >>>> >>>> 1947ad4c56ff5c5c90e093900a163e0ac3144bb6 >>>> >>>> Maven Repo: >>>> >>>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011 >>>> >>>> <repositories> >>>> <repository> >>>> <id>stormcrawler-3.5.0-rc2</id> >>>> <name>Testing StormCrawler 3.5.0 release candidate 2</name> >>>> <url> >>>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011 >>>> </url> >>>> </repository> >>>> </repositories> >>>> >>>> Release notes: >>>> >>>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0 >>>> >>>> Reminder: The up-2-date KEYS file for signature verification can be >>>> found here: https://downloads.apache.org/stormcrawler/KEYS >>>> >>>> Please vote on releasing these packages as Apache StormCrawler 3.5.0 >>>> The vote is open for at least the next 72 hours. >>>> >>>> Only votes from the StormCrawler PMC are binding, but everyone is welcome >>>> to check the release candidate and vote. >>>> The vote passes if at least three binding +1 votes are cast. >>>> >>>> Please VOTE >>>> >>>> [+1] go ship it >>>> [+0] meh, don't care >>>> [-1] stop, there is a ${showstopper} >>>> >>>> Thanks! >>>> Richard
