Yes. Classpath order is OS (and JDK) dependent. I didn't encounter it on OSX nor on my Ubuntu Docker Images.
But since it might affect more people and is unexpected, I will cancel the release (again) and will start it again once Tika with the fix is released :-) Gruß Richard Am 11. September 2025 17:59:13 MESZ schrieb Markos Volikas <[email protected]>: >Thanks! I did some more searching and found that the issue in my case was that >commons-compress-1.27.1 >(/opt/apache-storm-2.8.2/lib/commons-compress-1.27.1.jar) was ending in the >classpath :-( > >When i changes the storm lib to 1.28.0 the issue was fixed. I have no idea >though why I am the only one experiencing this issue. > >Markos > >On 9/11/25 18:11, Richard Zowalla wrote: >> I will try to reproduce it in the evening with the snippets / sample project >> and steps you have provided :-) >> >> Am 11. September 2025 17:09:40 MESZ schrieb Markos Volikas >> <[email protected]>: >>> I have attached it. It only contains 1.28.0, but my maven repository has >>> many versions that were fetched when building SC from source and I don't >>> understand why this happens to be honest. >>> >>> I'm also not completely sure what happens when submitting the jar since >>> storm itself depends on another version of compress.. >>> >>> /opt/apache-storm-2.8.2/bin/storm local target/test-1.0-SNAPSHOT.jar >>> org.apache.storm.flux.Flux crawler.flux --local-ttl 3600 >>> >>> I hope this is not a silly mistake and I'm wasting your time :-) >>> >>> On 9/11/25 17:56, Richard Zowalla wrote: >>>> What does your mvn dependency:tree tell? :-) >>>> >>>> The only thing that needs to be cleaned is the locally installed SC. >>>> >>>> >>>> >>>> Am 11. September 2025 16:48:53 MESZ schrieb Markos Volikas >>>> <[email protected]>: >>>>> Yes.. >>>>> >>>>> I'm building from source using: >>>>> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC2/ >>>>> (tar.gz) >>>>> >>>>> I completely removed >>>>> /home/markos/.m2/repository/org/apache/commons/commons-compress and then >>>>> ran mvn clean install and it seems that multiple versions are getting in. >>>>> >>>>> Before this I had also removed my .m2/ completely to make sure all >>>>> dependencies are downloaded and they did. I have attached the build log. >>>>> >>>>> markos@nombat:~/.m2/repository/org/apache/commons/commons-compress$ ll >>>>> total 28 >>>>> drwxrwxr-x 7 markos markos 4096 Sep 11 17:42 ./ >>>>> drwxrwxr-x 12 markos markos 4096 Sep 11 17:42 ../ >>>>> drwxrwxr-x 2 markos markos 4096 Sep 11 17:42 1.20/ >>>>> drwxrwxr-x 2 markos markos 4096 Sep 11 17:42 1.26.1/ >>>>> drwxrwxr-x 2 markos markos 4096 Sep 11 17:42 1.26.2/ >>>>> drwxrwxr-x 2 markos markos 4096 Sep 11 17:42 1.27.1/ >>>>> drwxrwxr-x 2 markos markos 4096 Sep 11 17:42 1.28.0/ >>>>> >>>>> Markos >>>>> >>>>> On 9/11/25 16:55, Richard Zowalla wrote: >>>>>> Cleaned your local Maven repo before building the uber jar? >>>>>> >>>>>> Can you check your compress version? >>>>>> >>>>>> Gruß >>>>>> Richard >>>>>> >>>>>> Am 11. September 2025 15:38:38 MESZ schrieb Markos Volikas >>>>>> <[email protected]>: >>>>>>> Hi all, >>>>>>> >>>>>>> I'm afraid I'm still getting: >>>>>>> >>>>>>> 16:25:13.829 [Thread-46-parse-executor[6, 6]] INFO >>>>>>> o.a.s.b.JSoupParserBolt - Parsing : starting https://apache.org/ >>>>>>> 16:25:13.848 [Thread-46-parse-executor[6, 6]] ERROR >>>>>>> o.a.s.b.JSoupParserBolt - Exception while guessing mimetype on >>>>>>> https://apache.org/: >>>>>>> org.apache.commons.compress.archivers.ArchiveException: No Archiver >>>>>>> found for the stream signature >>>>>>> >>>>>>> I'm running in local mode with Storm 2.8.2 running on Ubuntu 24.04 >>>>>>> (openjdk 17.0.16 2025-07-15). The database is Solr running in Docker >>>>>>> although this should be irrelevant. Maybe I'm doing something wrong? I >>>>>>> have attached the config I'm using in case you have any ideas. Sorry >>>>>>> for the delay, but I just found time to look into this again :-( >>>>>>> >>>>>>> Markos >>>>>>> >>>>>>> On 9/8/25 20:46, Richard Zowalla wrote: >>>>>>>> Hi folks, >>>>>>>> >>>>>>>> I have posted a 2nd release candidate for the Apache StormCrawler >>>>>>>> 3.5.0 release and it is ready for testing. The regression with Tika / >>>>>>>> Compress was fixed. >>>>>>>> >>>>>>>> Apache StormCrawler 3.5.0 decouples Selenium from the core module, >>>>>>>> improving modularity and reducing unnecessary dependencies. >>>>>>>> The release also introduces an advanced metadata filtering systemt hat >>>>>>>> supports complex logical operations like key=>val OR (key2=>val2 AND >>>>>>>> key3=>val3). >>>>>>>> Additionally, multiple dependencies were upgraded, core tests >>>>>>>> improved, and deprecated code cleaned up, enhancing overall stability >>>>>>>> and maintainability. >>>>>>>> >>>>>>>> Thank you to everyone who contributed to this release, including all >>>>>>>> of our users and the people who submitted bug reports, >>>>>>>> contributed code or documentation enhancements. >>>>>>>> >>>>>>>> The release was made using the Apache StormCrawler release process, >>>>>>>> documented here: >>>>>>>> https://github.com/apache/stormcrawler/blob/main/RELEASING.md >>>>>>>> >>>>>>>> Source: >>>>>>>> >>>>>>>> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC >>>>>>>> >>>>>>>> <https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC1>2 >>>>>>>> >>>>>>>> Tag: >>>>>>>> >>>>>>>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0 >>>>>>>> >>>>>>>> Commit Hash: >>>>>>>> >>>>>>>> 1947ad4c56ff5c5c90e093900a163e0ac3144bb6 >>>>>>>> >>>>>>>> Maven Repo: >>>>>>>> >>>>>>>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011 >>>>>>>> >>>>>>>> <repositories> >>>>>>>> <repository> >>>>>>>> <id>stormcrawler-3.5.0-rc2</id> >>>>>>>> <name>Testing StormCrawler 3.5.0 release candidate 2</name> >>>>>>>> <url> >>>>>>>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011 >>>>>>>> </url> >>>>>>>> </repository> >>>>>>>> </repositories> >>>>>>>> >>>>>>>> Release notes: >>>>>>>> >>>>>>>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0 >>>>>>>> >>>>>>>> Reminder: The up-2-date KEYS file for signature verification can be >>>>>>>> found here: https://downloads.apache.org/stormcrawler/KEYS >>>>>>>> >>>>>>>> Please vote on releasing these packages as Apache StormCrawler 3.5.0 >>>>>>>> The vote is open for at least the next 72 hours. >>>>>>>> >>>>>>>> Only votes from the StormCrawler PMC are binding, but everyone is >>>>>>>> welcome to check the release candidate and vote. >>>>>>>> The vote passes if at least three binding +1 votes are cast. >>>>>>>> >>>>>>>> Please VOTE >>>>>>>> >>>>>>>> [+1] go ship it >>>>>>>> [+0] meh, don't care >>>>>>>> [-1] stop, there is a ${showstopper} >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Richard
