We are seeing lots of stability problems with Spark 2.1.1 as a result of dropped events. We disabled the event log, which seemed to help, but many events are still being dropped, as in the example log below.
I there any way for me to see what listener is backing up the queue? Is there any workaround for this issue? 2017-08-03 04:13:29,852 ERROR org.apache.spark.scheduler.LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. 2017-08-03 04:13:29,853 WARN org.apache.spark.scheduler.LiveListenerBus: Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970 2017-08-03 04:14:29,854 WARN org.apache.spark.scheduler.LiveListenerBus: Dropped 32738 SparkListenerEvents since Thu Aug 03 04:13:29 UTC 2017 2017-08-03 04:15:15,095 INFO org.allenai.s2.pipeline.spark.steps.LoadDaqPapers$: Finished in 127.572 seconds. 2017-08-03 04:15:15,095 INFO org.allenai.s2.common.metrics.Metrics$: Adding additional tags to all metrics and events: [pipeline, env:prod] 2017-08-03 04:15:15,149 INFO org.allenai.s2.pipeline.spark.steps.MergeSourcedPapers$: Computing 2017-08-03 04:15:29,853 WARN org.apache.spark.scheduler.LiveListenerBus: Dropped 28816 SparkListenerEvents since Thu Aug 03 04:14:29 UTC 2017 2017-08-03 04:16:29,868 WARN org.apache.spark.scheduler.LiveListenerBus: Dropped 18613 SparkListenerEvents since Thu Aug 03 04:15:29 UTC 2017 2017-08-03 04:17:29,868 WARN org.apache.spark.scheduler.LiveListenerBus: Dropped 52231 SparkListenerEvents since Thu Aug 03 04:16:29 UTC 2017 2017-08-03 04:18:29,868 WARN org.apache.spark.scheduler.LiveListenerBus: Dropped 16646 SparkListenerEvents since Thu Aug 03 04:17:29 UTC 2017 2017-08-03 04:19:29,868 WARN org.apache.spark.scheduler.LiveListenerBus: Dropped 19693 SparkListenerEvents since Thu Aug 03 04:18:29 UTC 2017