Hi, I've looked into it a bit more and found that SC had a dependency on storm-core and not storm-client; I've fixed this in 40612a3... <https://github.com/DigitalPebble/storm-crawler/commit/40612a3588d66e1d410a70b1c7e5db58d5c2ba4d> however this doesn't affect the issues I had last week.
*httpclient dependency conflict* As seen last week, this is not shaded by Storm and the version used (4.3.3 <https://github.com/apache/storm/blob/ce984cd31a16e7fe4b983659005f1f7648455404/pom.xml#L266>) is quite old. Even within Storm, the Storm-SOLR module uses a more recent one (4.5 <https://github.com/apache/storm/blob/master/external/storm-solr/pom.xml#L64>). StormCrawler needs at least 4.5.5 <https://github.com/DigitalPebble/storm-crawler/blob/master/core/pom.xml#L26>. I expect other Storm users would use *httpclient* and have a similar problem. Unless I am missing something, I can see the following solutions sorted by how convenient they are to me as a user: 1. the dependency is shaded by Storm 2. the dependency is upgraded to 4.5.5 by Storm 3. the dependency is shaded by StormCrawler Obviously, I'd rather not have to deal with (3) and anyone using httpclient with Storm would have to do the same. Note: I can get my topology to work by specifying a protocol implementation based on OKHttp * http.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"* * https.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"* *LocalCluster* Since removing the dependency on storm-core, I can't use LocalCluster directly. I'll create a separate branch on my test repo to try to reproduce the issue. *Documentation for Local mode* http://storm.apache.org/releases/2.0.0-SNAPSHOT/Local-mode.html does not mention *--local-ttl *would be good to document it and indicate what the default value is otherwise users might wonder why their topologies run for 20 secs only. Personally, I'd rather be able to have a default behaviour where the topology runs forever or at least be able to deactivate the TTL completely by setting it to -1. *ConfigurableTopology* I am getting a different behavior between the original ConfigurableTopology from StormCrawler <https://github.com/DigitalPebble/storm-crawler/blob/master/core/src/main/java/com/digitalpebble/stormcrawler/ConfigurableTopology.java> and when I extend the one in Storm <https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/topology/ConfigurableTopology.java>; with the latter, any configuration found in the conf files passed in args to the command line are added to the default values I provide instead of replacing them. I'll investigate that further and open an issue if I find a bug. *Distributed mode* I managed to launch the various services and run my test topology in remote mode (by changing the protocol implementation as explained above) *Flux* http://storm.apache.org/releases/2.0.0-SNAPSHOT/flux.html tells me to run storm jar myTopology-0.1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local my_config.yaml *apache-storm-2.0.0/bin/storm jar target/2-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local crawler.flux* but am getting *15:07:26.206 [main] ERROR o.a.s.f.Flux - To run in local mode run with 'storm local' instead of 'storm jar'* *so *I tried both apache-storm-2.0.0/bin/storm local target/2-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local crawler.flux and *apache-storm-2.0.0/bin/storm local target/2-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux crawler.flux* but in both cases I'm getting *15:12:06.784 [main] ERROR o.a.s.f.Flux - To run in local mode run with 'storm local' instead of 'storm jar'* *15:12:06.784 [main] INFO o.a.s.LocalCluster - * * RUNNING LOCAL CLUSTER for 20 seconds.* and nothing happens, the topology just dies after 20secs without feching any URLs. I haven't tried Flux in distributed mode yet. Thanks! Julien PS: my test topology is in https://github.com/DigitalPebble/storm2 On Fri, 19 Oct 2018 at 19:32, Julien Nioche <[email protected]> wrote: > Hi Bobby > > The dependency issue happens when I have only storm-client as a dependency > and not server. > > When trying to run it from Eclipse I had to add server to the pom, as > expected but also client as I was getting > > 19:22:13.044 [main] ERROR o.a.s.u.VersionInfo - Could not load > storm-core-version-info.properties > java.io.IOException: Resource not found > at > org.apache.storm.utils.VersionInfo$VersionInfoImpl.<init>(VersionInfo.java:53) > [storm-client-2.0.0.jar:2.0.0] > at org.apache.storm.utils.VersionInfo.<clinit>(VersionInfo.java:41) > [storm-client-2.0.0.jar:2.0.0] > at org.apache.storm.daemon.nimbus.Nimbus.<clinit>(Nimbus.java:281) > [storm-server-2.0.0.jar:2.0.0] > at org.apache.storm.LocalCluster.<init>(LocalCluster.java:235) > [storm-server-2.0.0.jar:2.0.0] > at org.apache.storm.LocalCluster.<init>(LocalCluster.java:156) > [storm-server-2.0.0.jar:2.0.0] > at > com.digitalpebble.stormcrawler.ConfigurableTopology.submit(ConfigurableTopology.java:74) > [classes/:?] > at com.dipe.sc.CrawlTopology.run(CrawlTopology.java:80) [classes/:?] > at > com.digitalpebble.stormcrawler.ConfigurableTopology.start(ConfigurableTopology.java:49) > [classes/:?] > at com.dipe.sc.CrawlTopology.main(CrawlTopology.java:39) [classes/:?] > > I've put the code in https://github.com/DigitalPebble/storm2 if you want > to have a look. You'll need to compile the branch 2.x of SC first > https://github.com/DigitalPebble/storm-crawler/tree/2.x > > To reproduce the ZK issue, open the project in Eclipse and run the > CrawlTopology class with "-local -conf crawler-conf.yaml" in arguments. > > For the dependency problem, mvn clean package followed by > /data/apache-storm-2.0.0/bin/storm local target/2-1.0-SNAPSHOT.jar > com.dipe.sc.CrawlTopology -conf crawler-conf.yaml > should give java.lang.NoSuchMethodError: > org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder; > > Thanks > > Julien > > On Fri, 19 Oct 2018 at 17:26, Bobby Evans <[email protected]> wrote: > >> Sorry I should clarify a bit. >> >> `storm local` will run things in local mode, but the classpath will >> include >> things that are not shaded. >> >> This is also true for trying to run tests from eclipse. LocalCluster is a >> part of storm-server so you will need to pull that in just for testing. >> storm-client is what you want to depend on for the majority of your >> topology. >> >> The ZK issue is new to me We have done a lot in local mode and not seen >> that as an issue. If you can help me reproduce it I am happy to try and >> debug it to see what is happening. >> >> Thanks, >> >> Bobby >> >> On Fri, Oct 19, 2018 at 11:21 AM Bobby Evans <[email protected]> wrote: >> >> > It is shaded in storm 2.x, but we split the classpath up, so what you >> want >> > to depend on is storm-client only. I see you are pulling in storm-core >> and >> > a few other things that are not shaded, because they are only used by >> the >> > daemons, not the clients. >> > >> > On Fri, Oct 19, 2018 at 10:55 AM Julien Nioche < >> > [email protected]> wrote: >> > >> >> Sorry, hit Return too quickly >> >> >> >> I am testing Storm 2.0.0 with StormCrawler, not very successfully. One >> >> immediate issue is that I am getting a version conflict on httpclient >> as >> >> the version set by Storm is older than the one I need. >> >> >> >> java.lang.NoSuchMethodError: >> >> >> >> >> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder; >> >> at >> >> >> >> >> com.digitalpebble.stormcrawler.protocol.httpclient.HttpProtocol.configure(HttpProtocol.java:141) >> >> ~[2-1.0-SNAPSHOT.jar:?] >> >> at >> >> >> >> >> com.digitalpebble.stormcrawler.protocol.ProtocolFactory.<init>(ProtocolFactory.java:69) >> >> ~[2-1.0-SNAPSHOT.jar:?] >> >> at >> >> >> >> >> com.digitalpebble.stormcrawler.bolt.FetcherBolt.prepare(FetcherBolt.java:760) >> >> ~[2-1.0-SNAPSHOT.jar:?] >> >> at >> org.apache.storm.executor.bolt.BoltExecutor.init(BoltExecutor.java:144) >> >> ~[storm-client-2.0.0.jar:2.0.0] >> >> at >> org.apache.storm.executor.bolt.BoltExecutor.call(BoltExecutor.java:154) >> >> ~[storm-client-2.0.0.jar:2.0.0] >> >> at >> org.apache.storm.executor.bolt.BoltExecutor.call(BoltExecutor.java:58) >> >> ~[storm-client-2.0.0.jar:2.0.0] >> >> at org.apache.storm.utils.Utils$1.run(Utils.java:353) >> >> [storm-client-2.0.0.jar:2.0.0] >> >> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191] >> >> >> >> Here is the classpath when calling *storm local ....* >> >> >> >> *16:38:03.445 [main] INFO o.a.s.s.o.a.z.ZooKeeper - Client >> >> >> >> >> environment:java.class.path=/data/apache-storm-2.0.0/*:/data/apache-storm-2.0.0/lib/log4j-over-slf4j-1.6.6.jar:/data/apache-storm-2.0.0/lib/hadoop-auth-2.6.1.jar:/data/apache-storm-2.0.0/lib/jaxb-api-2.3.0.jar:/data/apache-storm-2.0.0/lib/kryo-shaded-3.0.3.jar:/data/apache-storm-2.0.0/lib/kryo-3.0.3.jar:/data/apache-storm-2.0.0/lib/commons-cli-1.4.jar:/data/apache-storm-2.0.0/lib/log4j-slf4j-impl-2.11.1.jar:/data/apache-storm-2.0.0/lib/jetty-continuation-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/httpclient-4.3.3.jar:/data/apache-storm-2.0.0/lib/commons-io-2.6.jar:/data/apache-storm-2.0.0/lib/commons-collections-3.2.2.jar:/data/apache-storm-2.0.0/lib/guava-16.0.1.jar:/data/apache-storm-2.0.0/lib/metrics-graphite-3.2.6.jar:/data/apache-storm-2.0.0/lib/jetty-http-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/tools.logging-0.2.3.jar:/data/apache-storm-2.0.0/lib/jetty-util-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/rocksdbjni-5.8.6.jar:/data/apache-storm-2.0.0/lib/commons-fileupload-1.3.3.jar:/data/apache-storm-2.0.0/lib/curator-framework-4.0.1.jar:/data/apache-storm-2.0.0/lib/jackson-dataformat-smile-2.9.4.jar:/data/apache-storm-2.0.0/lib/asm-5.0.3.jar:/data/apache-storm-2.0.0/lib/jetty-io-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/chill-java-0.8.0.jar:/data/apache-storm-2.0.0/lib/curator-client-4.0.1.jar:/data/apache-storm-2.0.0/lib/httpcore-4.3.2.jar:/data/apache-storm-2.0.0/lib/log4j-api-2.11.1.jar:/data/apache-storm-2.0.0/lib/jetty-security-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/storm-clojure-2.0.0.jar:/data/apache-storm-2.0.0/lib/commons-compress-1.16.1.jar:/data/apache-storm-2.0.0/lib/jetty-server-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/netty-3.7.0.Final.jar:/data/apache-storm-2.0.0/lib/json-simple-1.1.jar:/data/apache-storm-2.0.0/lib/junit-4.12.jar:/data/apache-storm-2.0.0/lib/jetty-servlet-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/objenesis-2.6.jar:/data/apache-storm-2.0.0/lib/jetty-servlets-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/carbonite-1.5.0.jar:/data/apache-storm-2.0.0/lib/storm-server-2.0.0.jar:/data/apache-storm-2.0.0/lib/shaded-deps-2.0.0.jar:/data/apache-storm-2.0.0/lib/javax.servlet-api-3.1.0.jar:/data/apache-storm-2.0.0/lib/commons-logging-1.1.3.jar:/data/apache-storm-2.0.0/lib/jline-0.9.94.jar:/data/apache-storm-2.0.0/lib/storm-client-2.0.0.jar:/data/apache-storm-2.0.0/lib/snakeyaml-1.11.jar:/data/apache-storm-2.0.0/lib/hamcrest-core-1.3.jar:/data/apache-storm-2.0.0/lib/minlog-1.3.0.jar:/data/apache-storm-2.0.0/lib/slf4j-api-1.7.21.jar:/data/apache-storm-2.0.0/lib/log4j-core-2.11.1.jar:/data/apache-storm-2.0.0/lib/commons-exec-1.3.jar:/data/apache-storm-2.0.0/lib/storm-core-2.0.0.jar:/data/apache-storm-2.0.0/lib/jackson-core-2.9.4.jar:/data/apache-storm-2.0.0/lib/zookeeper-3.4.6.jar:/data/apache-storm-2.0.0/lib/commons-lang-2.6.jar:/data/apache-storm-2.0.0/lib/clojure-1.7.0.jar:/data/apache-storm-2.0.0/lib/metrics-core-3.2.6.jar:/data/apache-storm-2.0.0/lib/reflectasm-1.10.1.jar:/data/apache-storm-2.0.0/lib/commons-codec-1.11.jar:/data/apache-storm-2.0.0/lib/joda-time-2.3.jar:/data/apache-storm-2.0.0/extlib/*:target/2-1.0-SNAPSHOT.jar:/data/apache-storm-2.0.0/conf:/data/apache-storm-2.0.0/bin* >> >> >> >> This doesn't happen with Storm 1.2.2. Aren't these libs supposed to be >> >> shaded by Storm? >> >> >> >> Another issue is when I try to launch a topology from Eclipse (as I was >> >> able to do with Storm 1.x), even when adding >> >> >> >> *<dependency>* >> >> * <groupId>org.apache.storm</groupId>* >> >> * <artifactId>storm-server</artifactId>* >> >> * <version>2.0.0</version>* >> >> * </dependency>* >> >> * <dependency>* >> >> * <groupId>org.apache.storm</groupId>* >> >> * <artifactId>storm-core</artifactId>* >> >> * <version>2.0.0</version>* >> >> * </dependency>* >> >> >> >> as suggested by >> >> http://storm.apache.org/releases/2.0.0-SNAPSHOT/Local-mode.html, there >> >> seems to be an issue with ZK. The 2nd dependency is not mentioned on >> that >> >> page but seems to be needed. >> >> >> >> *16:50:53.041 [ProcessThread(sid:0 cport:-1):] INFO >> >> o.a.s.s.o.a.z.s.PrepRequestProcessor - Got user-level KeeperException >> when >> >> processing sessionid:0x1668d05b0630007 type:create cxid:0x2 zxid:0x28 >> >> txntype:-1 reqpath:n/a Error Path:/storm/blobstoremaxkeysequencenumber >> >> Error:KeeperErrorCode = NoNode for >> /storm/blobstoremaxkeysequencenumber* >> >> >> >> and the topology never starts. I could, of course, rely on "storm >> local" >> >> but being able to run a local topology without installing Storm is >> quite >> >> nice for users who just want to give it a try. >> >> >> >> Any thoughts? >> >> >> >> Julien >> >> >> >> >> >> On Fri, 19 Oct 2018 at 16:40, Julien Nioche < >> >> [email protected]> >> >> wrote: >> >> >> >> > Hi, >> >> > >> >> > I am testing Storm 2.0.0 with StormCrawler, not very successfully >> >> > >> >> > On Tue, 16 Oct 2018 at 20:48, P. Taylor Goetz <[email protected]> >> >> wrote: >> >> > >> >> >> This is a call to vote on releasing Apache Storm 2.0.0 (rc3) >> >> >> >> >> >> Full list of changes in this release: >> >> >> >> >> >> >> >> >> >> >> >> https://dist.apache.org/repos/dist/dev/storm/apache-storm-2.0.0-rc3/RELEASE_NOTES.html >> >> >> >> >> >> The tag/commit to be voted upon is v2.0.0: >> >> >> >> >> >> >> >> >> >> >> >> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=commit;h=d2d6f40344e6cc92ab07f3a462d577ef6b61f8b1 >> >> >> >> >> >> The source archive being voted upon can be found here: >> >> >> >> >> >> >> >> >> >> >> >> https://dist.apache.org/repos/dist/dev/storm/apache-storm-2.0.0-rc3/apache-storm-2.0.0-src.tar.gz >> >> >> >> >> >> Other release files, signatures and digests can be found here: >> >> >> >> >> >> >> https://dist.apache.org/repos/dist/dev/storm/apache-storm-2.0.0-rc3/ >> >> >> >> >> >> The release artifacts are signed with the following key: >> >> >> >> >> >> >> >> >> >> >> >> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=blob_plain;f=KEYS;hb=22b832708295fa2c15c4f3c70ac0d2bc6fded4bd >> >> >> >> >> >> The Nexus staging repository for this release is: >> >> >> >> >> >> >> https://repository.apache.org/content/repositories/orgapachestorm-1072 >> >> >> >> >> >> Please vote on releasing this package as Apache Storm 2.0.0. >> >> >> >> >> >> When voting, please list the actions taken to verify the release. >> >> >> >> >> >> This vote will be open for at least 72 hours. >> >> >> >> >> >> [ ] +1 Release this package as Apache Storm 2.0.0 >> >> >> [ ] 0 No opinion >> >> >> [ ] -1 Do not release this package because... >> >> >> >> >> >> Thanks to everyone who contributed to this release. >> >> >> >> >> >> -Taylor >> >> >> >> >> > >> >> > >> >> > -- >> >> > >> >> > *Open Source Solutions for Text Engineering* >> >> > >> >> > http://www.digitalpebble.com >> >> > http://digitalpebble.blogspot.com/ >> >> > #digitalpebble <http://twitter.com/digitalpebble> >> >> > >> >> >> >> >> >> -- >> >> >> >> *Open Source Solutions for Text Engineering* >> >> >> >> http://www.digitalpebble.com >> >> http://digitalpebble.blogspot.com/ >> >> #digitalpebble <http://twitter.com/digitalpebble> >> >> >> > >> > > > -- > > *Open Source Solutions for Text Engineering* > > http://www.digitalpebble.com > http://digitalpebble.blogspot.com/ > #digitalpebble <http://twitter.com/digitalpebble> > -- *Open Source Solutions for Text Engineering* http://www.digitalpebble.com http://digitalpebble.blogspot.com/ #digitalpebble <http://twitter.com/digitalpebble>
