Thanks Bobby,

This looks like a serious issue to me. Any ideas how I can provide more
information (like enable some logs etc) to gain more insight into this
problem?

It might be a good idea to add some retry logic or some waiting logic on
the node that comes up empty handed so that it handles the error more
gracefully rather than crashing with a NullPointerException?

Also, the leader election is supposed to happen through zookeeper, right?
Isn't the new leader becoming a leader after saving its state in zookeeper?
Because then the other nodes should not come empty handed.
If no, then it seems like a bug and the leader should persist the state in
zookeeper first before becoming a leader.


> looks like it is caused by trying to read a NimbusSummary for the leader
but not being able to find it
Instead of crashing, this should trigger a new leader election IMO with
some good warning messages in the logs.


Disclaimer: I have not seen the actual code that does the nimbus leader
election. Above are just some suggestions based on my limited knowledge. So
please forgive any outrageous/obvious ideas :)



On Tue, May 9, 2017 at 1:58 PM, Bobby Evans <[email protected]>
wrote:

> This looks like something odd is happening with leader election.  The
> exception looks like it is caused by trying to read a NimbusSummary for the
> leader but not being able to find it.  So it could mean that a leader is
> elected and is then crashing quickly enough that the other node when it
> tries to read this loses the race and comes up empty handed.  But if you
> only have a single nimbus configured then this is not the case and
> something else worse is happening.
>
>
> - Bobby
>
> On Monday, May 8, 2017, 4:41:13 PM CDT, S G <[email protected]>
> wrote:Hi,
>
> I am trying to upgrade from 1.0.2 to 1.1.0 version of Storm.
> And I see the below exception happening randomly on the Nimbus node.
> When it happens, Nimbus is unable to accept any new topologies.
>
>
> java.lang.NullPointerException: null
>         at
> clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:301)
> ~[clojure-1.7.0.jar:?]
>         at
> org.apache.storm.daemon.nimbus$mk_reified_nimbus$
> reify__10782.getLeader(nimbus.clj:2383)
> ~[storm-core-1.1.0.jar:1.1.0]
>         at
> org.apache.storm.generated.Nimbus$Processor$getLeader.
> getResult(Nimbus.java:3944)
> ~[storm-core-1.1.0.jar:1.1.0]
>         at
> org.apache.storm.generated.Nimbus$Processor$getLeader.
> getResult(Nimbus.java:3928)
> ~[storm-core-1.1.0.jar:1.1.0]
>         at
> org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39)
> ~[storm-core-1.1.0.jar:1.1.0]
>         at
> org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> ~[storm-core-1.1.0.jar:1.1.0]
>         at
> org.apache.storm.security.auth.SimpleTransportPlugin$
> SimpleWrapProcessor.process(SimpleTransportPlugin.java:162)
> ~[storm-core-1.1.0.jar:1.1.0]
>         at
> org.apache.storm.thrift.server.AbstractNonblockingServer$
> FrameBuffer.invoke(AbstractNonblockingServer.java:518)
> ~[storm-core-1.1.0.jar:1.1.0]
>         at
> org.apache.storm.thrift.server.Invocation.run(Invocation.java:18)
> ~[storm-core-1.1.0.jar:1.1.0]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> [?:1.8.0_51]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> [?:1.8.0_51]
>         at java.lang.Thread.run(Thread.java:745) [?:1.8.0_51]
>
>
> I have not been able to isolate what causes this exception.
> Any help would be appreciated.
>
> Thanks
> SG
>

Reply via email to