[ 
https://issues.apache.org/jira/browse/SPARK-21303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-21303:
----------------------------------
    Fix Version/s:     (was: 2.1.1)

> Web-UI shows some Jobs get stuck randomly and stays like that. Neither able 
> to kill
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-21303
>                 URL: https://issues.apache.org/jira/browse/SPARK-21303
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.1.0, 2.1.1
>         Environment: Kubernetes 1.4.12 on AWS 
> OS Ubuntu
> Spark 2.1.1
> Cassandra 3.9
>            Reporter: Arun Achuthan
>
> We are running a streaming application which was running without any issues 
> for long. Last few days we are seeing some jobs randomly getting stuck on the 
> web ui.  This doesn't stop the application as the  following jobs are 
> successful. The stuck jobs remain in the web-ui as stuck with no progress. 
> These are the observations we made.  At the time the first job is shown stuck 
> on UI  the driver logs  mention this
> 2017-07-04 05:33:20,189 ERROR [dag-scheduler-event-loop] 
> org.apache.spark.scheduler.LiveListenerBus: Dropping SparkListenerEvent 
> because no remaining room in event queue. This likely means one of the 
> SparkListeners is too slow and cannot keep up with the rate at which tasks 
> are being started by the scheduler.
> For every other random stuck job  the driver logs mention  the below at the 
> same time
> 2017-07-04 05:33:20,194 WARN [dispatcher-event-loop-0] 
> org.apache.spark.scheduler.LiveListenerBus: Dropped 1 SparkListenerEvents 
> since Thu Jan 01 00:00:00 UTC 1970
>  
> 2017-07-04 05:49:31,571 WARN [dag-scheduler-event-loop] 
> org.apache.spark.scheduler.LiveListenerBus: Dropped 1 SparkListenerEvents 
> since Tue Jul 04 05:34:20 UTC 2017
> After  the jobs starts getting stuck  we are experiencing performance  drops 
> as well as scheduling delays within the application. We couldn't find any 
> other significant errors in the driver logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to