I will try to reproduce it in the evening with the snippets / sample project 
and steps you have provided :-)

Am 11. September 2025 17:09:40 MESZ schrieb Markos Volikas 
<[email protected]>:
>I have attached it. It only contains 1.28.0, but my maven repository has many 
>versions that were fetched when building SC from source and I don't understand 
>why this happens to be honest.
>
>I'm also not completely sure what happens when submitting the jar since storm 
>itself depends on another version of compress..
>
>/opt/apache-storm-2.8.2/bin/storm local target/test-1.0-SNAPSHOT.jar 
>org.apache.storm.flux.Flux crawler.flux --local-ttl 3600
>
>I hope this is not a silly mistake and I'm wasting your time :-)
>
>On 9/11/25 17:56, Richard Zowalla wrote:
>> What does your mvn dependency:tree tell? :-)
>> 
>> The only thing that needs to be cleaned is the locally installed SC.
>> 
>> 
>> 
>> Am 11. September 2025 16:48:53 MESZ schrieb Markos Volikas 
>> <[email protected]>:
>>> Yes..
>>> 
>>> I'm building from source using: 
>>> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC2/ 
>>> (tar.gz)
>>> 
>>> I completely removed 
>>> /home/markos/.m2/repository/org/apache/commons/commons-compress and then 
>>> ran mvn clean install and it seems that multiple versions are getting in.
>>> 
>>> Before this I had also removed my .m2/ completely to make sure all 
>>> dependencies are downloaded and they did. I have attached the build log.
>>> 
>>> markos@nombat:~/.m2/repository/org/apache/commons/commons-compress$ ll
>>> total 28
>>> drwxrwxr-x  7 markos markos 4096 Sep 11 17:42 ./
>>> drwxrwxr-x 12 markos markos 4096 Sep 11 17:42 ../
>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.20/
>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.26.1/
>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.26.2/
>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.27.1/
>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.28.0/
>>> 
>>> Markos
>>> 
>>> On 9/11/25 16:55, Richard Zowalla wrote:
>>>> Cleaned your local Maven repo before building the uber jar?
>>>> 
>>>> Can you check your compress version?
>>>> 
>>>> Gruß
>>>> Richard
>>>> 
>>>> Am 11. September 2025 15:38:38 MESZ schrieb Markos Volikas 
>>>> <[email protected]>:
>>>>> Hi all,
>>>>> 
>>>>> I'm afraid I'm still getting:
>>>>> 
>>>>> 16:25:13.829 [Thread-46-parse-executor[6, 6]] INFO  
>>>>> o.a.s.b.JSoupParserBolt - Parsing : starting https://apache.org/
>>>>> 16:25:13.848 [Thread-46-parse-executor[6, 6]] ERROR 
>>>>> o.a.s.b.JSoupParserBolt - Exception while guessing mimetype on 
>>>>> https://apache.org/: 
>>>>> org.apache.commons.compress.archivers.ArchiveException: No Archiver found 
>>>>> for the stream signature
>>>>> 
>>>>> I'm running in local mode with Storm 2.8.2 running on Ubuntu 24.04 
>>>>> (openjdk 17.0.16 2025-07-15). The database is Solr running in Docker 
>>>>> although this should be irrelevant. Maybe I'm doing something wrong? I 
>>>>> have attached the config I'm using in case you have any ideas. Sorry for 
>>>>> the delay, but I just found time to look into this again :-(
>>>>> 
>>>>> Markos
>>>>> 
>>>>> On 9/8/25 20:46, Richard Zowalla wrote:
>>>>>> Hi folks,
>>>>>> 
>>>>>> I have posted a 2nd release candidate for the Apache StormCrawler 3.5.0 
>>>>>> release and it is ready for testing. The regression with Tika / Compress 
>>>>>> was fixed.
>>>>>> 
>>>>>> Apache StormCrawler 3.5.0 decouples Selenium from the core module, 
>>>>>> improving modularity and reducing unnecessary dependencies.
>>>>>> The release also introduces an advanced metadata filtering systemt hat 
>>>>>> supports complex logical operations like key=>val OR (key2=>val2 AND 
>>>>>> key3=>val3).
>>>>>> Additionally, multiple dependencies were upgraded, core tests improved, 
>>>>>> and deprecated code cleaned up, enhancing overall stability and 
>>>>>> maintainability.
>>>>>> 
>>>>>> Thank you to everyone who contributed to this release, including all of 
>>>>>> our users and the people who submitted bug reports,
>>>>>> contributed code or documentation enhancements.
>>>>>> 
>>>>>> The release was made using the Apache StormCrawler release process, 
>>>>>> documented here:
>>>>>> https://github.com/apache/stormcrawler/blob/main/RELEASING.md
>>>>>> 
>>>>>> Source:
>>>>>> 
>>>>>> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC
>>>>>>  
>>>>>> <https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC1>2
>>>>>> 
>>>>>> Tag:
>>>>>> 
>>>>>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0
>>>>>> 
>>>>>> Commit Hash:
>>>>>> 
>>>>>> 1947ad4c56ff5c5c90e093900a163e0ac3144bb6
>>>>>> 
>>>>>> Maven Repo:
>>>>>> 
>>>>>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011
>>>>>> 
>>>>>> <repositories>
>>>>>> <repository>
>>>>>> <id>stormcrawler-3.5.0-rc2</id>
>>>>>> <name>Testing StormCrawler 3.5.0 release candidate 2</name>
>>>>>> <url>
>>>>>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011
>>>>>> </url>
>>>>>> </repository>
>>>>>> </repositories>
>>>>>> 
>>>>>> Release notes:
>>>>>> 
>>>>>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0
>>>>>> 
>>>>>> Reminder: The up-2-date KEYS file for signature verification can be
>>>>>> found here: https://downloads.apache.org/stormcrawler/KEYS
>>>>>> 
>>>>>> Please vote on releasing these packages as Apache StormCrawler 3.5.0
>>>>>> The vote is open for at least the next 72 hours.
>>>>>> 
>>>>>> Only votes from the StormCrawler PMC are binding, but everyone is 
>>>>>> welcome to check the release candidate and vote.
>>>>>> The vote passes if at least three binding +1 votes are cast.
>>>>>> 
>>>>>> Please VOTE
>>>>>> 
>>>>>> [+1] go ship it
>>>>>> [+0] meh, don't care
>>>>>> [-1] stop, there is a ${showstopper}
>>>>>> 
>>>>>> Thanks!
>>>>>> Richard

Reply via email to