Hi,

I've looked into it a bit more and found that SC had a dependency on
storm-core and not storm-client; I've fixed this in 40612a3...
<https://github.com/DigitalPebble/storm-crawler/commit/40612a3588d66e1d410a70b1c7e5db58d5c2ba4d>
however
this doesn't affect the issues I had last week.

*httpclient dependency conflict*
As seen last week, this is not shaded by Storm and the version used (4.3.3
<https://github.com/apache/storm/blob/ce984cd31a16e7fe4b983659005f1f7648455404/pom.xml#L266>)
is quite old. Even within Storm, the Storm-SOLR module uses a more recent
one (4.5
<https://github.com/apache/storm/blob/master/external/storm-solr/pom.xml#L64>).
StormCrawler needs at least 4.5.5
<https://github.com/DigitalPebble/storm-crawler/blob/master/core/pom.xml#L26>.
I expect other Storm users would use *httpclient* and have a similar
problem. Unless I am missing something, I can see the following solutions
sorted by how convenient they are to me as a user:

   1. the dependency is shaded by Storm
   2. the dependency is upgraded to 4.5.5 by Storm
   3. the dependency is shaded by StormCrawler

Obviously, I'd rather not have to deal with (3) and anyone using
httpclient with Storm would have to do the same.

Note: I can get my topology to work by specifying a protocol implementation
based on OKHttp
*  http.protocol.implementation:
"com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"*
*  https.protocol.implementation:
"com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"*

*LocalCluster*
Since removing the dependency on storm-core, I can't use LocalCluster
directly. I'll create a separate branch on my test repo to try to reproduce
the issue.

*Documentation for Local mode*
http://storm.apache.org/releases/2.0.0-SNAPSHOT/Local-mode.html
does not mention *--local-ttl *would be good to document it and indicate
what the default value is otherwise users might wonder why their topologies
run for 20 secs only.  Personally, I'd rather be able to have a default
behaviour where the topology runs forever or at least be able to deactivate
the TTL completely by setting it to -1.

*ConfigurableTopology*
I am getting a different behavior between the original
ConfigurableTopology from
StormCrawler
<https://github.com/DigitalPebble/storm-crawler/blob/master/core/src/main/java/com/digitalpebble/stormcrawler/ConfigurableTopology.java>
and when I extend the one in Storm
<https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/topology/ConfigurableTopology.java>;
with the latter, any configuration found in the conf files passed in args
to the command line are added to the default values I provide instead of
replacing them. I'll investigate that further and open an issue if I find a
bug.

*Distributed mode*
I managed to launch the various services and run my test topology in remote
mode (by changing the protocol implementation as explained above)

*Flux*
http://storm.apache.org/releases/2.0.0-SNAPSHOT/flux.html tells me to run

storm jar myTopology-0.1.0-SNAPSHOT.jar org.apache.storm.flux.Flux
--local my_config.yaml



*apache-storm-2.0.0/bin/storm jar target/2-1.0-SNAPSHOT.jar
org.apache.storm.flux.Flux --local crawler.flux*

but am getting

*15:07:26.206 [main] ERROR o.a.s.f.Flux - To run in local mode run with
'storm local' instead of 'storm jar'*

*so *I tried both

apache-storm-2.0.0/bin/storm local target/2-1.0-SNAPSHOT.jar
org.apache.storm.flux.Flux --local crawler.flux

and

*apache-storm-2.0.0/bin/storm local target/2-1.0-SNAPSHOT.jar
org.apache.storm.flux.Flux crawler.flux*
but in both cases I'm getting

*15:12:06.784 [main] ERROR o.a.s.f.Flux - To run in local mode run with
'storm local' instead of 'storm jar'*
*15:12:06.784 [main] INFO  o.a.s.LocalCluster - *

* RUNNING LOCAL CLUSTER for 20 seconds.*

and nothing happens, the topology just dies after 20secs without feching
any URLs.

I haven't tried Flux in distributed mode yet.

Thanks!

Julien

PS: my test topology is in https://github.com/DigitalPebble/storm2








On Fri, 19 Oct 2018 at 19:32, Julien Nioche <[email protected]>
wrote:

> Hi Bobby
>
> The dependency issue happens when I have only storm-client as a dependency
> and not server.
>
> When trying to run it from Eclipse I had to add server to the pom, as
> expected but also client as I was getting
>
> 19:22:13.044 [main] ERROR o.a.s.u.VersionInfo - Could not load
> storm-core-version-info.properties
> java.io.IOException: Resource not found
> at
> org.apache.storm.utils.VersionInfo$VersionInfoImpl.<init>(VersionInfo.java:53)
> [storm-client-2.0.0.jar:2.0.0]
> at org.apache.storm.utils.VersionInfo.<clinit>(VersionInfo.java:41)
> [storm-client-2.0.0.jar:2.0.0]
> at org.apache.storm.daemon.nimbus.Nimbus.<clinit>(Nimbus.java:281)
> [storm-server-2.0.0.jar:2.0.0]
> at org.apache.storm.LocalCluster.<init>(LocalCluster.java:235)
> [storm-server-2.0.0.jar:2.0.0]
> at org.apache.storm.LocalCluster.<init>(LocalCluster.java:156)
> [storm-server-2.0.0.jar:2.0.0]
> at
> com.digitalpebble.stormcrawler.ConfigurableTopology.submit(ConfigurableTopology.java:74)
> [classes/:?]
> at com.dipe.sc.CrawlTopology.run(CrawlTopology.java:80) [classes/:?]
> at
> com.digitalpebble.stormcrawler.ConfigurableTopology.start(ConfigurableTopology.java:49)
> [classes/:?]
> at com.dipe.sc.CrawlTopology.main(CrawlTopology.java:39) [classes/:?]
>
> I've put the code in https://github.com/DigitalPebble/storm2  if you want
> to have a look. You'll need to compile the branch 2.x of SC first
> https://github.com/DigitalPebble/storm-crawler/tree/2.x
>
> To reproduce the ZK issue, open the project in Eclipse and run the
> CrawlTopology class with "-local -conf crawler-conf.yaml" in arguments.
>
> For the dependency problem, mvn clean package followed by
> /data/apache-storm-2.0.0/bin/storm local target/2-1.0-SNAPSHOT.jar
> com.dipe.sc.CrawlTopology -conf crawler-conf.yaml
> should give java.lang.NoSuchMethodError:
> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>
> Thanks
>
> Julien
>
> On Fri, 19 Oct 2018 at 17:26, Bobby Evans <[email protected]> wrote:
>
>> Sorry I should clarify a bit.
>>
>> `storm local` will run things in local mode, but the classpath will
>> include
>> things that are not shaded.
>>
>> This is also true for trying to run tests from eclipse.  LocalCluster is a
>> part of storm-server so you will need to pull that in just for testing.
>> storm-client is what you want to depend on for the majority of your
>> topology.
>>
>> The ZK issue is new to me  We have done a lot in local mode and not seen
>> that as an issue.  If you can help me reproduce it I am happy to try and
>> debug it to see what is happening.
>>
>> Thanks,
>>
>> Bobby
>>
>> On Fri, Oct 19, 2018 at 11:21 AM Bobby Evans <[email protected]> wrote:
>>
>> > It is shaded in storm 2.x, but we split the classpath up, so what you
>> want
>> > to depend on is storm-client only.  I see you are pulling in storm-core
>> and
>> > a few other things that are not shaded, because they are only used by
>> the
>> > daemons, not the clients.
>> >
>> > On Fri, Oct 19, 2018 at 10:55 AM Julien Nioche <
>> > [email protected]> wrote:
>> >
>> >> Sorry, hit Return too quickly
>> >>
>> >> I am testing Storm 2.0.0 with StormCrawler, not very successfully. One
>> >> immediate issue is that I am getting a version conflict on httpclient
>> as
>> >> the version set by Storm is older than the one I need.
>> >>
>> >> java.lang.NoSuchMethodError:
>> >>
>> >>
>> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>> >> at
>> >>
>> >>
>> com.digitalpebble.stormcrawler.protocol.httpclient.HttpProtocol.configure(HttpProtocol.java:141)
>> >> ~[2-1.0-SNAPSHOT.jar:?]
>> >> at
>> >>
>> >>
>> com.digitalpebble.stormcrawler.protocol.ProtocolFactory.<init>(ProtocolFactory.java:69)
>> >> ~[2-1.0-SNAPSHOT.jar:?]
>> >> at
>> >>
>> >>
>> com.digitalpebble.stormcrawler.bolt.FetcherBolt.prepare(FetcherBolt.java:760)
>> >> ~[2-1.0-SNAPSHOT.jar:?]
>> >> at
>> org.apache.storm.executor.bolt.BoltExecutor.init(BoltExecutor.java:144)
>> >> ~[storm-client-2.0.0.jar:2.0.0]
>> >> at
>> org.apache.storm.executor.bolt.BoltExecutor.call(BoltExecutor.java:154)
>> >> ~[storm-client-2.0.0.jar:2.0.0]
>> >> at
>> org.apache.storm.executor.bolt.BoltExecutor.call(BoltExecutor.java:58)
>> >> ~[storm-client-2.0.0.jar:2.0.0]
>> >> at org.apache.storm.utils.Utils$1.run(Utils.java:353)
>> >> [storm-client-2.0.0.jar:2.0.0]
>> >> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
>> >>
>> >> Here is the classpath when calling *storm local ....*
>> >>
>> >> *16:38:03.445 [main] INFO  o.a.s.s.o.a.z.ZooKeeper - Client
>> >>
>> >>
>> environment:java.class.path=/data/apache-storm-2.0.0/*:/data/apache-storm-2.0.0/lib/log4j-over-slf4j-1.6.6.jar:/data/apache-storm-2.0.0/lib/hadoop-auth-2.6.1.jar:/data/apache-storm-2.0.0/lib/jaxb-api-2.3.0.jar:/data/apache-storm-2.0.0/lib/kryo-shaded-3.0.3.jar:/data/apache-storm-2.0.0/lib/kryo-3.0.3.jar:/data/apache-storm-2.0.0/lib/commons-cli-1.4.jar:/data/apache-storm-2.0.0/lib/log4j-slf4j-impl-2.11.1.jar:/data/apache-storm-2.0.0/lib/jetty-continuation-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/httpclient-4.3.3.jar:/data/apache-storm-2.0.0/lib/commons-io-2.6.jar:/data/apache-storm-2.0.0/lib/commons-collections-3.2.2.jar:/data/apache-storm-2.0.0/lib/guava-16.0.1.jar:/data/apache-storm-2.0.0/lib/metrics-graphite-3.2.6.jar:/data/apache-storm-2.0.0/lib/jetty-http-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/tools.logging-0.2.3.jar:/data/apache-storm-2.0.0/lib/jetty-util-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/rocksdbjni-5.8.6.jar:/data/apache-storm-2.0.0/lib/commons-fileupload-1.3.3.jar:/data/apache-storm-2.0.0/lib/curator-framework-4.0.1.jar:/data/apache-storm-2.0.0/lib/jackson-dataformat-smile-2.9.4.jar:/data/apache-storm-2.0.0/lib/asm-5.0.3.jar:/data/apache-storm-2.0.0/lib/jetty-io-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/chill-java-0.8.0.jar:/data/apache-storm-2.0.0/lib/curator-client-4.0.1.jar:/data/apache-storm-2.0.0/lib/httpcore-4.3.2.jar:/data/apache-storm-2.0.0/lib/log4j-api-2.11.1.jar:/data/apache-storm-2.0.0/lib/jetty-security-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/storm-clojure-2.0.0.jar:/data/apache-storm-2.0.0/lib/commons-compress-1.16.1.jar:/data/apache-storm-2.0.0/lib/jetty-server-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/netty-3.7.0.Final.jar:/data/apache-storm-2.0.0/lib/json-simple-1.1.jar:/data/apache-storm-2.0.0/lib/junit-4.12.jar:/data/apache-storm-2.0.0/lib/jetty-servlet-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/objenesis-2.6.jar:/data/apache-storm-2.0.0/lib/jetty-servlets-9.4.7.v20170914.jar:/data/apache-storm-2.0.0/lib/carbonite-1.5.0.jar:/data/apache-storm-2.0.0/lib/storm-server-2.0.0.jar:/data/apache-storm-2.0.0/lib/shaded-deps-2.0.0.jar:/data/apache-storm-2.0.0/lib/javax.servlet-api-3.1.0.jar:/data/apache-storm-2.0.0/lib/commons-logging-1.1.3.jar:/data/apache-storm-2.0.0/lib/jline-0.9.94.jar:/data/apache-storm-2.0.0/lib/storm-client-2.0.0.jar:/data/apache-storm-2.0.0/lib/snakeyaml-1.11.jar:/data/apache-storm-2.0.0/lib/hamcrest-core-1.3.jar:/data/apache-storm-2.0.0/lib/minlog-1.3.0.jar:/data/apache-storm-2.0.0/lib/slf4j-api-1.7.21.jar:/data/apache-storm-2.0.0/lib/log4j-core-2.11.1.jar:/data/apache-storm-2.0.0/lib/commons-exec-1.3.jar:/data/apache-storm-2.0.0/lib/storm-core-2.0.0.jar:/data/apache-storm-2.0.0/lib/jackson-core-2.9.4.jar:/data/apache-storm-2.0.0/lib/zookeeper-3.4.6.jar:/data/apache-storm-2.0.0/lib/commons-lang-2.6.jar:/data/apache-storm-2.0.0/lib/clojure-1.7.0.jar:/data/apache-storm-2.0.0/lib/metrics-core-3.2.6.jar:/data/apache-storm-2.0.0/lib/reflectasm-1.10.1.jar:/data/apache-storm-2.0.0/lib/commons-codec-1.11.jar:/data/apache-storm-2.0.0/lib/joda-time-2.3.jar:/data/apache-storm-2.0.0/extlib/*:target/2-1.0-SNAPSHOT.jar:/data/apache-storm-2.0.0/conf:/data/apache-storm-2.0.0/bin*
>> >>
>> >> This doesn't happen with Storm 1.2.2. Aren't these libs supposed to be
>> >> shaded by Storm?
>> >>
>> >> Another issue is when I try to launch a topology from Eclipse (as I was
>> >> able to do with Storm 1.x), even when adding
>> >>
>> >> *<dependency>*
>> >> * <groupId>org.apache.storm</groupId>*
>> >> * <artifactId>storm-server</artifactId>*
>> >> * <version>2.0.0</version>*
>> >> * </dependency>*
>> >> * <dependency>*
>> >> * <groupId>org.apache.storm</groupId>*
>> >> * <artifactId>storm-core</artifactId>*
>> >> * <version>2.0.0</version>*
>> >> * </dependency>*
>> >>
>> >> as suggested by
>> >> http://storm.apache.org/releases/2.0.0-SNAPSHOT/Local-mode.html, there
>> >> seems to be an issue with ZK. The 2nd dependency is not mentioned on
>> that
>> >> page but seems to be needed.
>> >>
>> >> *16:50:53.041 [ProcessThread(sid:0 cport:-1):] INFO
>> >> o.a.s.s.o.a.z.s.PrepRequestProcessor - Got user-level KeeperException
>> when
>> >> processing sessionid:0x1668d05b0630007 type:create cxid:0x2 zxid:0x28
>> >> txntype:-1 reqpath:n/a Error Path:/storm/blobstoremaxkeysequencenumber
>> >> Error:KeeperErrorCode = NoNode for
>> /storm/blobstoremaxkeysequencenumber*
>> >>
>> >> and the topology never starts. I could, of course, rely on "storm
>> local"
>> >> but being able to run a local topology without installing Storm is
>> quite
>> >> nice for users who just want to give it a try.
>> >>
>> >> Any thoughts?
>> >>
>> >> Julien
>> >>
>> >>
>> >> On Fri, 19 Oct 2018 at 16:40, Julien Nioche <
>> >> [email protected]>
>> >> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I am testing Storm 2.0.0 with StormCrawler, not very successfully
>> >> >
>> >> > On Tue, 16 Oct 2018 at 20:48, P. Taylor Goetz <[email protected]>
>> >> wrote:
>> >> >
>> >> >> This is a call to vote on releasing Apache Storm 2.0.0 (rc3)
>> >> >>
>> >> >> Full list of changes in this release:
>> >> >>
>> >> >>
>> >> >>
>> >>
>> https://dist.apache.org/repos/dist/dev/storm/apache-storm-2.0.0-rc3/RELEASE_NOTES.html
>> >> >>
>> >> >> The tag/commit to be voted upon is v2.0.0:
>> >> >>
>> >> >>
>> >> >>
>> >>
>> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=commit;h=d2d6f40344e6cc92ab07f3a462d577ef6b61f8b1
>> >> >>
>> >> >> The source archive being voted upon can be found here:
>> >> >>
>> >> >>
>> >> >>
>> >>
>> https://dist.apache.org/repos/dist/dev/storm/apache-storm-2.0.0-rc3/apache-storm-2.0.0-src.tar.gz
>> >> >>
>> >> >> Other release files, signatures and digests can be found here:
>> >> >>
>> >> >>
>> https://dist.apache.org/repos/dist/dev/storm/apache-storm-2.0.0-rc3/
>> >> >>
>> >> >> The release artifacts are signed with the following key:
>> >> >>
>> >> >>
>> >> >>
>> >>
>> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=blob_plain;f=KEYS;hb=22b832708295fa2c15c4f3c70ac0d2bc6fded4bd
>> >> >>
>> >> >> The Nexus staging repository for this release is:
>> >> >>
>> >> >>
>> https://repository.apache.org/content/repositories/orgapachestorm-1072
>> >> >>
>> >> >> Please vote on releasing this package as Apache Storm 2.0.0.
>> >> >>
>> >> >> When voting, please list the actions taken to verify the release.
>> >> >>
>> >> >> This vote will be open for at least 72 hours.
>> >> >>
>> >> >> [ ] +1 Release this package as Apache Storm 2.0.0
>> >> >> [ ]  0 No opinion
>> >> >> [ ] -1 Do not release this package because...
>> >> >>
>> >> >> Thanks to everyone who contributed to this release.
>> >> >>
>> >> >> -Taylor
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > *Open Source Solutions for Text Engineering*
>> >> >
>> >> > http://www.digitalpebble.com
>> >> > http://digitalpebble.blogspot.com/
>> >> > #digitalpebble <http://twitter.com/digitalpebble>
>> >> >
>> >>
>> >>
>> >> --
>> >>
>> >> *Open Source Solutions for Text Engineering*
>> >>
>> >> http://www.digitalpebble.com
>> >> http://digitalpebble.blogspot.com/
>> >> #digitalpebble <http://twitter.com/digitalpebble>
>> >>
>> >
>>
>
>
> --
>
> *Open Source Solutions for Text Engineering*
>
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble <http://twitter.com/digitalpebble>
>


-- 

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>

Reply via email to