Hi Marcin, Do you have a non-cut-off version of this log line? It seems like this log message would have the cause in it.
ERROR io.druid.server.coordinator.DruidCoordinator Caught exception, ignoring so that schedule keeps going.: {class=io.druid.server.coordinator.DruidCo... Btw, since we are currently trying to migrate our mailing lists, please also include dev@druid.apache.org in Druid dev threads (I have added it to this one). Gian On Wed, Apr 11, 2018 at 12:37 AM, Marcin Kuthan <marcin.kut...@gmail.com> wrote: > >> 1. Coordinator loses leadership >> https://github.com/druid-io/druid/issues/5561 >> >> >> > This issue is the most problematic for our cluster stability, +1 for > 0.12.1 release. > > I also found another scenario when coordinator is not able to recovery > after connection to zookeeper is lost: > > WARN org.apache.zookeeper.ClientCnxn Session 0x761da07ba56edff for server > x.x.x.x:2181, unexpected error, closing socket conn... > INFO org.apache.curator.framework.state.ConnectionStateManager State > change: SUSPENDED > INFO io.druid.server.coordinator.DruidCoordinator I am no longer the > leader... > INFO io.druid.curator.discovery.CuratorServiceAnnouncer Unannouncing > service[DruidNode{serviceName='druid/coordinator', > host='druidcoordinator... > INFO io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvi > der$NodeTypeWatcher Ignored event type [CONNECTION_SUSPENDED] for > nodeType [overlord] watcher. > INFO io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvi > der$NodeTypeWatcher Ignored event type [CONNECTION_SUSPENDED] for > nodeType [broker] watcher. > INFO io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvi > der$NodeTypeWatcher Ignored event type [CONNECTION_SUSPENDED] for > nodeType [historical] watcher. > INFO io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvi > der$NodeTypeWatcher Ignored event type [CONNECTION_SUSPENDED] for > nodeType [peon] watcher. > INFO org.apache.zookeeper.ClientCnxn Opening socket connection to server > x.x.x.x:2181. Will not attempt to authenticate using... > INFO org.apache.zookeeper.ClientCnxn Socket connection established to > x.x.x.x:2181, initiating session > INFO org.apache.zookeeper.ClientCnxn Session establishment complete on > server x.x.x.x:2181, sessionid = 0x761da07ba56edff, ne... > INFO org.apache.curator.framework.state.ConnectionStateManager State > change: RECONNECTED > INFO io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvi > der$NodeTypeWatcher Ignored event type [CONNECTION_RECONNECTED] for > nodeType [broker] watcher. > INFO io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvi > der$NodeTypeWatcher Ignored event type [CONNECTION_RECONNECTED] for > nodeType [overlord] watcher. > INFO io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvi > der$NodeTypeWatcher Ignored event type [CONNECTION_RECONNECTED] for > nodeType [historical] watcher. > INFO io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvi > der$NodeTypeWatcher Ignored event type [CONNECTION_RECONNECTED] for > nodeType [peon] watcher. > INFO io.druid.server.coordinator.DruidCoordinator I am the leader of the > coordinators, all must bow! > INFO io.druid.server.coordinator.DruidCoordinator Starting coordination > in [PT30S] > INFO io.druid.curator.discovery.CuratorServiceAnnouncer Announcing > service[DruidNode{serviceName='druid/coordinator', > host='druidcoordinator.n... > INFO io.druid.metadata.SQLMetadataRuleManager Polled and found rules for > 29 datasource(s) > INFO io.druid.server.coordinator.DruidCoordinator Done making indexing > service helpers [[io.druid.server.coordinator.helper. > DruidCoordinatorSegmentInf... > INFO io.druid.server.lookup.cache.LookupCoordinatorManager Not updating > lookups because no data exists > ERROR io.druid.server.coordinator.DruidCoordinator InventoryManagers not > started[[false, true]] > INFO io.druid.server.coordinator.DruidCoordinator I am no longer the > leader... > INFO io.druid.curator.discovery.CuratorServiceAnnouncer Unannouncing > service[DruidNode{serviceName='druid/coordinator', > host='druidcoordinator... > ERROR io.druid.server.coordinator.DruidCoordinator Caught exception, > ignoring so that schedule keeps going.: {class=io.druid.server. > coordinator.DruidCo... > INFO io.druid.server.coordinator.DruidCoordinator I am no longer the > leader... > ERROR io.druid.server.coordinator.DruidCoordinator InventoryManagers not > started[[false, true]] > > And so on every 30 seconds. > > > -- > You received this message because you are subscribed to the Google Groups > "Druid Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to druid-development+unsubscr...@googlegroups.com. > To post to this group, send email to druid-developm...@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/druid-development/3ca336e9-ad73-4a4d-92d8- > bb135d17c6cb%40googlegroups.com > <https://groups.google.com/d/msgid/druid-development/3ca336e9-ad73-4a4d-92d8-bb135d17c6cb%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. >