clintropolis opened a new pull request #6516: fix exception in Supervisor.start causing overlord unable to become leader URL: https://github.com/apache/incubator-druid/pull/6516 This PR fixes an issue where an exception thrown by a `Supervisor.start()` implementation can wreck the leadership lifecycle start of `SupervisorManager` which in turn wrecks `TaskMaster` start, prevent any overlord from obtaining leadership. Observed in a test cluster with a custom kinesis indexing extension, which was broken by recent changes to core druid in which `aws-java-sdk` dependencies are pulled in as well as a version bump, resulting in the custom extension expecting some jars to be provided that no longer are, and of a different version. Anyway, it's failure to start caused the cluster to be without any functioning overlord, which doesn't seem the most chill behavior. After this patch, failing supervisor starts will be logged at error level, but the `SupervisorManager` will still attempt to start any remaining supervisors, allowing the overlord to continue functioning in a partially degraded state instead of not at all. Similar in spirit to issue and fix of #6512 Relevant logs: ``` 2018-10-25T04:34:43,537 ERROR [LeaderSelector[/demo/overlord/_OVERLORD]] org.apache.druid.curator.discovery.CuratorDruidLeaderSelector - listener becomeLeader() failed. Unable to become leader: {class=org.apache.druid.curator.discovery.CuratorDruidLeaderSelector, exceptionType=class java.lang.RuntimeException, exceptionMessage=java.lang.reflect.InvocationTargetException} java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?] at org.apache.druid.indexing.overlord.TaskMaster$1.becomeLeader(TaskMaster.java:153) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.curator.discovery.CuratorDruidLeaderSelector$1.isLeader(CuratorDruidLeaderSelector.java:98) [druid-server-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:665) [curator-recipes-4.0.0.jar:4.0.0] at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:661) [curator-recipes-4.0.0.jar:4.0.0] at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-4.0.0.jar:4.0.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181] Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:412) ~[java-util-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:311) ~[java-util-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.indexing.overlord.TaskMaster$1.becomeLeader(TaskMaster.java:150) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] ... 7 more Caused by: java.lang.NoClassDefFoundError: com/amazonaws/transform/JsonErrorUnmarshallerV2 at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:226) ~[?:?] at com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:222) ~[?:?] at com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:196) ~[?:?] at org.apache.druid.indexing.kinesis.KinesisRecordSupplier.<init>(KinesisRecordSupplier.java:264) ~[?:?] at org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisor.setupRecordSupplier(KinesisSupervisor.java:800) ~[?:?] at org.apache.druid.indexing.kinesis.supervisor.KinesisSupervisor.start(KinesisSupervisor.java:340) ~[?:?] at org.apache.druid.indexing.overlord.supervisor.SupervisorManager.createAndStartSupervisorInternal(SupervisorManager.java:290) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.indexing.overlord.supervisor.SupervisorManager.start(SupervisorManager.java:136) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:412) ~[java-util-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:311) ~[java-util-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.indexing.overlord.TaskMaster$1.becomeLeader(TaskMaster.java:150) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] ... 7 more ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
