I'm sorry for this mess. Tika 3.2.3 is now under vote [0]. That allows
for backward compatibility with compress < 1.28.0.

Dependency mgmt is important, but there are better ways to identify
issues than this. :(

[0] https://lists.apache.org/thread/px1stbwnbgx301y4sg6yxycrmcqt27gf

On Thu, Sep 11, 2025 at 11:59 AM Markos Volikas <[email protected]> wrote:
>
> Thanks! I did some more searching and found that the issue in my case
> was that commons-compress-1.27.1
> (/opt/apache-storm-2.8.2/lib/commons-compress-1.27.1.jar) was ending in
> the classpath :-(
>
> When i changes the storm lib to 1.28.0 the issue was fixed. I have no
> idea though why I am the only one experiencing this issue.
>
> Markos
>
> On 9/11/25 18:11, Richard Zowalla wrote:
> > I will try to reproduce it in the evening with the snippets / sample 
> > project and steps you have provided :-)
> >
> > Am 11. September 2025 17:09:40 MESZ schrieb Markos Volikas 
> > <[email protected]>:
> >> I have attached it. It only contains 1.28.0, but my maven repository has 
> >> many versions that were fetched when building SC from source and I don't 
> >> understand why this happens to be honest.
> >>
> >> I'm also not completely sure what happens when submitting the jar since 
> >> storm itself depends on another version of compress..
> >>
> >> /opt/apache-storm-2.8.2/bin/storm local target/test-1.0-SNAPSHOT.jar 
> >> org.apache.storm.flux.Flux crawler.flux --local-ttl 3600
> >>
> >> I hope this is not a silly mistake and I'm wasting your time :-)
> >>
> >> On 9/11/25 17:56, Richard Zowalla wrote:
> >>> What does your mvn dependency:tree tell? :-)
> >>>
> >>> The only thing that needs to be cleaned is the locally installed SC.
> >>>
> >>>
> >>>
> >>> Am 11. September 2025 16:48:53 MESZ schrieb Markos Volikas 
> >>> <[email protected]>:
> >>>> Yes..
> >>>>
> >>>> I'm building from source using: 
> >>>> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC2/
> >>>>  (tar.gz)
> >>>>
> >>>> I completely removed 
> >>>> /home/markos/.m2/repository/org/apache/commons/commons-compress and then 
> >>>> ran mvn clean install and it seems that multiple versions are getting in.
> >>>>
> >>>> Before this I had also removed my .m2/ completely to make sure all 
> >>>> dependencies are downloaded and they did. I have attached the build log.
> >>>>
> >>>> markos@nombat:~/.m2/repository/org/apache/commons/commons-compress$ ll
> >>>> total 28
> >>>> drwxrwxr-x  7 markos markos 4096 Sep 11 17:42 ./
> >>>> drwxrwxr-x 12 markos markos 4096 Sep 11 17:42 ../
> >>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.20/
> >>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.26.1/
> >>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.26.2/
> >>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.27.1/
> >>>> drwxrwxr-x  2 markos markos 4096 Sep 11 17:42 1.28.0/
> >>>>
> >>>> Markos
> >>>>
> >>>> On 9/11/25 16:55, Richard Zowalla wrote:
> >>>>> Cleaned your local Maven repo before building the uber jar?
> >>>>>
> >>>>> Can you check your compress version?
> >>>>>
> >>>>> Gruß
> >>>>> Richard
> >>>>>
> >>>>> Am 11. September 2025 15:38:38 MESZ schrieb Markos Volikas 
> >>>>> <[email protected]>:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'm afraid I'm still getting:
> >>>>>>
> >>>>>> 16:25:13.829 [Thread-46-parse-executor[6, 6]] INFO  
> >>>>>> o.a.s.b.JSoupParserBolt - Parsing : starting https://apache.org/
> >>>>>> 16:25:13.848 [Thread-46-parse-executor[6, 6]] ERROR 
> >>>>>> o.a.s.b.JSoupParserBolt - Exception while guessing mimetype on 
> >>>>>> https://apache.org/: 
> >>>>>> org.apache.commons.compress.archivers.ArchiveException: No Archiver 
> >>>>>> found for the stream signature
> >>>>>>
> >>>>>> I'm running in local mode with Storm 2.8.2 running on Ubuntu 24.04 
> >>>>>> (openjdk 17.0.16 2025-07-15). The database is Solr running in Docker 
> >>>>>> although this should be irrelevant. Maybe I'm doing something wrong? I 
> >>>>>> have attached the config I'm using in case you have any ideas. Sorry 
> >>>>>> for the delay, but I just found time to look into this again :-(
> >>>>>>
> >>>>>> Markos
> >>>>>>
> >>>>>> On 9/8/25 20:46, Richard Zowalla wrote:
> >>>>>>> Hi folks,
> >>>>>>>
> >>>>>>> I have posted a 2nd release candidate for the Apache StormCrawler 
> >>>>>>> 3.5.0 release and it is ready for testing. The regression with Tika / 
> >>>>>>> Compress was fixed.
> >>>>>>>
> >>>>>>> Apache StormCrawler 3.5.0 decouples Selenium from the core module, 
> >>>>>>> improving modularity and reducing unnecessary dependencies.
> >>>>>>> The release also introduces an advanced metadata filtering systemt 
> >>>>>>> hat supports complex logical operations like key=>val OR (key2=>val2 
> >>>>>>> AND key3=>val3).
> >>>>>>> Additionally, multiple dependencies were upgraded, core tests 
> >>>>>>> improved, and deprecated code cleaned up, enhancing overall stability 
> >>>>>>> and maintainability.
> >>>>>>>
> >>>>>>> Thank you to everyone who contributed to this release, including all 
> >>>>>>> of our users and the people who submitted bug reports,
> >>>>>>> contributed code or documentation enhancements.
> >>>>>>>
> >>>>>>> The release was made using the Apache StormCrawler release process, 
> >>>>>>> documented here:
> >>>>>>> https://github.com/apache/stormcrawler/blob/main/RELEASING.md
> >>>>>>>
> >>>>>>> Source:
> >>>>>>>
> >>>>>>> https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC
> >>>>>>>  
> >>>>>>> <https://dist.apache.org/repos/dist/dev/stormcrawler/stormcrawler-3.5.0-RC1>2
> >>>>>>>
> >>>>>>> Tag:
> >>>>>>>
> >>>>>>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0
> >>>>>>>
> >>>>>>> Commit Hash:
> >>>>>>>
> >>>>>>> 1947ad4c56ff5c5c90e093900a163e0ac3144bb6
> >>>>>>>
> >>>>>>> Maven Repo:
> >>>>>>>
> >>>>>>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011
> >>>>>>>
> >>>>>>> <repositories>
> >>>>>>> <repository>
> >>>>>>> <id>stormcrawler-3.5.0-rc2</id>
> >>>>>>> <name>Testing StormCrawler 3.5.0 release candidate 2</name>
> >>>>>>> <url>
> >>>>>>> https://repository.apache.org/content/repositories/orgapachestormcrawler-1011
> >>>>>>> </url>
> >>>>>>> </repository>
> >>>>>>> </repositories>
> >>>>>>>
> >>>>>>> Release notes:
> >>>>>>>
> >>>>>>> https://github.com/apache/stormcrawler/releases/tag/stormcrawler-3.5.0
> >>>>>>>
> >>>>>>> Reminder: The up-2-date KEYS file for signature verification can be
> >>>>>>> found here: https://downloads.apache.org/stormcrawler/KEYS
> >>>>>>>
> >>>>>>> Please vote on releasing these packages as Apache StormCrawler 3.5.0
> >>>>>>> The vote is open for at least the next 72 hours.
> >>>>>>>
> >>>>>>> Only votes from the StormCrawler PMC are binding, but everyone is 
> >>>>>>> welcome to check the release candidate and vote.
> >>>>>>> The vote passes if at least three binding +1 votes are cast.
> >>>>>>>
> >>>>>>> Please VOTE
> >>>>>>>
> >>>>>>> [+1] go ship it
> >>>>>>> [+0] meh, don't care
> >>>>>>> [-1] stop, there is a ${showstopper}
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> Richard

Reply via email to