Thanks Bobby, This looks like a serious issue to me. Any ideas how I can provide more information (like enable some logs etc) to gain more insight into this problem?
It might be a good idea to add some retry logic or some waiting logic on the node that comes up empty handed so that it handles the error more gracefully rather than crashing with a NullPointerException? Also, the leader election is supposed to happen through zookeeper, right? Isn't the new leader becoming a leader after saving its state in zookeeper? Because then the other nodes should not come empty handed. If no, then it seems like a bug and the leader should persist the state in zookeeper first before becoming a leader. > looks like it is caused by trying to read a NimbusSummary for the leader but not being able to find it Instead of crashing, this should trigger a new leader election IMO with some good warning messages in the logs. Disclaimer: I have not seen the actual code that does the nimbus leader election. Above are just some suggestions based on my limited knowledge. So please forgive any outrageous/obvious ideas :) On Tue, May 9, 2017 at 1:58 PM, Bobby Evans <[email protected]> wrote: > This looks like something odd is happening with leader election. The > exception looks like it is caused by trying to read a NimbusSummary for the > leader but not being able to find it. So it could mean that a leader is > elected and is then crashing quickly enough that the other node when it > tries to read this loses the race and comes up empty handed. But if you > only have a single nimbus configured then this is not the case and > something else worse is happening. > > > - Bobby > > On Monday, May 8, 2017, 4:41:13 PM CDT, S G <[email protected]> > wrote:Hi, > > I am trying to upgrade from 1.0.2 to 1.1.0 version of Storm. > And I see the below exception happening randomly on the Nimbus node. > When it happens, Nimbus is unable to accept any new topologies. > > > java.lang.NullPointerException: null > at > clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:301) > ~[clojure-1.7.0.jar:?] > at > org.apache.storm.daemon.nimbus$mk_reified_nimbus$ > reify__10782.getLeader(nimbus.clj:2383) > ~[storm-core-1.1.0.jar:1.1.0] > at > org.apache.storm.generated.Nimbus$Processor$getLeader. > getResult(Nimbus.java:3944) > ~[storm-core-1.1.0.jar:1.1.0] > at > org.apache.storm.generated.Nimbus$Processor$getLeader. > getResult(Nimbus.java:3928) > ~[storm-core-1.1.0.jar:1.1.0] > at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[storm-core-1.1.0.jar:1.1.0] > at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[storm-core-1.1.0.jar:1.1.0] > at > org.apache.storm.security.auth.SimpleTransportPlugin$ > SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) > ~[storm-core-1.1.0.jar:1.1.0] > at > org.apache.storm.thrift.server.AbstractNonblockingServer$ > FrameBuffer.invoke(AbstractNonblockingServer.java:518) > ~[storm-core-1.1.0.jar:1.1.0] > at > org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) > ~[storm-core-1.1.0.jar:1.1.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > [?:1.8.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > [?:1.8.0_51] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_51] > > > I have not been able to isolate what causes this exception. > Any help would be appreciated. > > Thanks > SG >
