Yes. Classpath order is OS (and JDK) dependent. I didn't encounter it on OSX 
nor on my Ubuntu Docker Images. 

But since it might affect more people and is unexpected, I will cancel the 
release (again) and will start it again once Tika with the fix is released :-)

Gruß 
Richard 


Am 11. September 2025 17:59:13 MESZ schrieb Markos Volikas 
<[email protected]>:
>Thanks! I did some more searching and found that the issue in my case was that 
>commons-compress-1.27.1 
>(/opt/apache-storm-2.8.2/lib/commons-compress-1.27.1.jar) was ending in the 
>classpath :-(
>
>When i changes the storm lib to 1.28.0 the issue was fixed. I have no idea 
>though why I am the only one experiencing this issue.
>
>Markos
>
>On 9/11/25 18:11, Richard Zowalla wrote:
>> I will try to reproduce it in the evening with the snippets / sample project 
>> and steps you have provided :-)
>> 
>> Am 11. September 2025 17:09:40 MESZ schrieb Markos Volikas 
>> <[email protected]>:
>>> I have attached it. It only contains 1.28.0, but my maven repository has 
>>> many versions that were fetched when building SC from source and I don't 
>>> understand why this happens to be honest.
>>> 
>>> I'm also not completely sure what happens when submitting the jar since 
>>> storm itself depends on another version of compress..
>>> 
>>> /opt/apache-storm-2.8.2/bin/storm local target/test-1.0-SNAPSHOT.jar 
>>> org.apache.storm.flux.Flux crawler.flux --local-ttl 3600
>>> 
>>> I hope this is not a silly mistake and I'm wasting your time :-)
>>> 
>>> On 9/11/25 17:56, Richard Zowalla wrote:
>>>> What does your mvn dependency:tree tell? :-)
>>>> 
>>>> The only thing that needs to be cleaned is the locally installed SC.
>>>> 
>>>> 
>>>> 
>>>> Am 11. September 2025 16:48:53 MESZ schrieb Markos Volikas 
>>>> <[email protected]>:
>>>>> Yes..
>>>>> 
>>>>> I'm building from source using: 
>>>>> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC2/
>>>>>  (tar.gz)
>>>>> 
>>>>> I completely removed 
>>>>> /home/markos/.m2/repository/org/apache/commons/commons-compress and then 
>>>>> ran mvn clean install and it seems that multiple versions are getting in.
>>>>> 
>>>>> Before this I had also removed my .m2/ completely to make sure all 
>>>>> dependencies are downloaded and they did. I have attached the build log.
>>>>> 
>>>>> markos@nombat:~/.m2/repository/org/apache/commons/commons-compress$ ll
>>>>> total 28
>>>>> drwxrwxr-x  7 markos markos 4096 Sep 11 17:42 ./
>>>>> drwxrwxr-x 12 markos markos 4096 Sep 11 17:42 ../
>>>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.20/
>>>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.26.1/
>>>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.26.2/
>>>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.27.1/
>>>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.28.0/
>>>>> 
>>>>> Markos
>>>>> 
>>>>> On 9/11/25 16:55, Richard Zowalla wrote:
>>>>>> Cleaned your local Maven repo before building the uber jar?
>>>>>> 
>>>>>> Can you check your compress version?
>>>>>> 
>>>>>> Gruß
>>>>>> Richard
>>>>>> 
>>>>>> Am 11. September 2025 15:38:38 MESZ schrieb Markos Volikas 
>>>>>> <[email protected]>:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I'm afraid I'm still getting:
>>>>>>> 
>>>>>>> 16:25:13.829 [Thread-46-parse-executor[6, 6]] INFO  
>>>>>>> o.a.s.b.JSoupParserBolt - Parsing : starting https://apache.org/
>>>>>>> 16:25:13.848 [Thread-46-parse-executor[6, 6]] ERROR 
>>>>>>> o.a.s.b.JSoupParserBolt - Exception while guessing mimetype on 
>>>>>>> https://apache.org/: 
>>>>>>> org.apache.commons.compress.archivers.ArchiveException: No Archiver 
>>>>>>> found for the stream signature
>>>>>>> 
>>>>>>> I'm running in local mode with Storm 2.8.2 running on Ubuntu 24.04 
>>>>>>> (openjdk 17.0.16 2025-07-15). The database is Solr running in Docker 
>>>>>>> although this should be irrelevant. Maybe I'm doing something wrong? I 
>>>>>>> have attached the config I'm using in case you have any ideas. Sorry 
>>>>>>> for the delay, but I just found time to look into this again :-(
>>>>>>> 
>>>>>>> Markos
>>>>>>> 
>>>>>>> On 9/8/25 20:46, Richard Zowalla wrote:
>>>>>>>> Hi folks,
>>>>>>>> 
>>>>>>>> I have posted a 2nd release candidate for the Apache StormCrawler 
>>>>>>>> 3.5.0 release and it is ready for testing. The regression with Tika / 
>>>>>>>> Compress was fixed.
>>>>>>>> 
>>>>>>>> Apache StormCrawler 3.5.0 decouples Selenium from the core module, 
>>>>>>>> improving modularity and reducing unnecessary dependencies.
>>>>>>>> The release also introduces an advanced metadata filtering systemt hat 
>>>>>>>> supports complex logical operations like key=>val OR (key2=>val2 AND 
>>>>>>>> key3=>val3).
>>>>>>>> Additionally, multiple dependencies were upgraded, core tests 
>>>>>>>> improved, and deprecated code cleaned up, enhancing overall stability 
>>>>>>>> and maintainability.
>>>>>>>> 
>>>>>>>> Thank you to everyone who contributed to this release, including all 
>>>>>>>> of our users and the people who submitted bug reports,
>>>>>>>> contributed code or documentation enhancements.
>>>>>>>> 
>>>>>>>> The release was made using the Apache StormCrawler release process, 
>>>>>>>> documented here:
>>>>>>>> https://github.com/apache/stormcrawler/blob/main/RELEASING.md
>>>>>>>> 
>>>>>>>> Source:
>>>>>>>> 
>>>>>>>> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC
>>>>>>>>  
>>>>>>>> <https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC1>2
>>>>>>>> 
>>>>>>>> Tag:
>>>>>>>> 
>>>>>>>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0
>>>>>>>> 
>>>>>>>> Commit Hash:
>>>>>>>> 
>>>>>>>> 1947ad4c56ff5c5c90e093900a163e0ac3144bb6
>>>>>>>> 
>>>>>>>> Maven Repo:
>>>>>>>> 
>>>>>>>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011
>>>>>>>> 
>>>>>>>> <repositories>
>>>>>>>> <repository>
>>>>>>>> <id>stormcrawler-3.5.0-rc2</id>
>>>>>>>> <name>Testing StormCrawler 3.5.0 release candidate 2</name>
>>>>>>>> <url>
>>>>>>>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011
>>>>>>>> </url>
>>>>>>>> </repository>
>>>>>>>> </repositories>
>>>>>>>> 
>>>>>>>> Release notes:
>>>>>>>> 
>>>>>>>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0
>>>>>>>> 
>>>>>>>> Reminder: The up-2-date KEYS file for signature verification can be
>>>>>>>> found here: https://downloads.apache.org/stormcrawler/KEYS
>>>>>>>> 
>>>>>>>> Please vote on releasing these packages as Apache StormCrawler 3.5.0
>>>>>>>> The vote is open for at least the next 72 hours.
>>>>>>>> 
>>>>>>>> Only votes from the StormCrawler PMC are binding, but everyone is 
>>>>>>>> welcome to check the release candidate and vote.
>>>>>>>> The vote passes if at least three binding +1 votes are cast.
>>>>>>>> 
>>>>>>>> Please VOTE
>>>>>>>> 
>>>>>>>> [+1] go ship it
>>>>>>>> [+0] meh, don't care
>>>>>>>> [-1] stop, there is a ${showstopper}
>>>>>>>> 
>>>>>>>> Thanks!
>>>>>>>> Richard

Reply via email to