clintropolis opened a new pull request #6516: fix exception in Supervisor.start 
causing overlord unable to become leader
URL: https://github.com/apache/incubator-druid/pull/6516
 
 
   This PR fixes an issue where an exception thrown by a `Supervisor.start()` 
implementation can wreck the leadership lifecycle start of `SupervisorManager` 
which in turn wrecks `TaskMaster` start, prevent any overlord from obtaining 
leadership.
   
   Observed in a test cluster with a custom kinesis indexing extension, which 
was broken by recent changes to core druid in which `aws-java-sdk` dependencies 
are pulled in as well as a version bump, resulting in the custom extension 
expecting some jars to be provided that no longer are, and of a different 
version. Anyway, it's failure to start caused the cluster to be without any 
functioning overlord, which doesn't seem the most chill behavior. After this 
patch, failing supervisor starts will be logged at error level, but the 
`SupervisorManager` will still attempt to start any remaining supervisors, 
allowing the overlord to continue functioning in a partially degraded state 
instead of not at all. Similar in spirit to issue and fix of #6512
   
   Relevant logs:
   ```
   2018-10-25T04:34:43,537 ERROR [LeaderSelector[/demo/overlord/_OVERLORD]] 
org.apache.druid.curator.discovery.CuratorDruidLeaderSelector - listener 
becomeLeader() failed. Unable to become leader: 
{class=org.apache.druid.curator.discovery.CuratorDruidLeaderSelector, 
exceptionType=class java.lang.RuntimeException, 
exceptionMessage=java.lang.reflect.InvocationTargetException}
   java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at com.google.common.base.Throwables.propagate(Throwables.java:160) 
~[guava-16.0.1.jar:?]
        at 
org.apache.druid.indexing.overlord.TaskMaster$1.becomeLeader(TaskMaster.java:153)
 ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.curator.discovery.CuratorDruidLeaderSelector$1.isLeader(CuratorDruidLeaderSelector.java:98)
 [druid-server-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:665)
 [curator-recipes-4.0.0.jar:4.0.0]
        at 
org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:661)
 [curator-recipes-4.0.0.jar:4.0.0]
        at 
org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
 [curator-framework-4.0.0.jar:4.0.0]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_181]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
   Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) ~[?:?]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_181]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
        at 
org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:412)
 ~[java-util-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:311) 
~[java-util-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.indexing.overlord.TaskMaster$1.becomeLeader(TaskMaster.java:150)
 ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
        ... 7 more
   Caused by: java.lang.NoClassDefFoundError: 
com/amazonaws/transform/JsonErrorUnmarshallerV2
        at 
com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:226)
 ~[?:?]
        at 
com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:222)
 ~[?:?]
        at 
com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:196)
 ~[?:?]
        at 
org.apache.druid.indexing.kinesis.KinesisRecordSupplier.<init>(KinesisRecordSupplier.java:264)
 ~[?:?]
        at 
org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisor.setupRecordSupplier(KinesisSupervisor.java:800)
 ~[?:?]
        at 
org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisor.start(KinesisSupervisor.java:340)
 ~[?:?]
        at 
org.apache.druid.indexing.overlord.supervisor.SupervisorManager.createAndStartSupervisorInternal(SupervisorManager.java:290)
 ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.indexing.overlord.supervisor.SupervisorManager.start(SupervisorManager.java:136)
 ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
        at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) ~[?:?]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_181]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
        at 
org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:412)
 ~[java-util-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:311) 
~[java-util-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.indexing.overlord.TaskMaster$1.becomeLeader(TaskMaster.java:150)
 ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
        ... 7 more
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to