[ 
https://issues.apache.org/jira/browse/SPARK-21303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074296#comment-16074296
 ] 

Arun Achuthan commented on SPARK-21303:
---------------------------------------

Thank You! 
 But at this momemt we don't have enough indicators form our logs to chase this 
through other than the above mentioned. Could you throw some pointers where to 
look at. The stuck jobs neither show up in the active jobs list too. The 
details of the job mentions its stuck at saving to cassandra. We never had any 
problems saving to cassandra from application..why is that the stuck jobs do 
not  show up in the active job list'.  

> Web-UI shows some Jobs get stuck randomly and stays like that. Neither able 
> to kill
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-21303
>                 URL: https://issues.apache.org/jira/browse/SPARK-21303
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.1.0, 2.1.1
>         Environment: Kubernetes 1.4.12 on AWS 
> OS Ubuntu
> Spark 2.1.1
> Cassandra 3.9
>            Reporter: Arun Achuthan
>
> We are running a streaming application which was running without any issues 
> for long. Last few days we are seeing some jobs randomly getting stuck on the 
> web ui.  This doesn't stop the application as the  following jobs are 
> successful. The stuck jobs remain in the web-ui as stuck with no progress. 
> These are the observations we made.  At the time the first job is shown stuck 
> on UI  the driver logs  mention this
> 2017-07-04 05:33:20,189 ERROR [dag-scheduler-event-loop] 
> org.apache.spark.scheduler.LiveListenerBus: Dropping SparkListenerEvent 
> because no remaining room in event queue. This likely means one of the 
> SparkListeners is too slow and cannot keep up with the rate at which tasks 
> are being started by the scheduler.
> For every other random stuck job  the driver logs mention  the below at the 
> same time
> 2017-07-04 05:33:20,194 WARN [dispatcher-event-loop-0] 
> org.apache.spark.scheduler.LiveListenerBus: Dropped 1 SparkListenerEvents 
> since Thu Jan 01 00:00:00 UTC 1970
>  
> 2017-07-04 05:49:31,571 WARN [dag-scheduler-event-loop] 
> org.apache.spark.scheduler.LiveListenerBus: Dropped 1 SparkListenerEvents 
> since Tue Jul 04 05:34:20 UTC 2017
> After  the jobs starts getting stuck  we are experiencing performance  drops 
> as well as scheduling delays within the application. We couldn't find any 
> other significant errors in the driver logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to