[ 
https://issues.apache.org/jira/browse/SPARK-30346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30346:
-----------------------------------

    Assignee: liupengcheng

> Improve logging when events dropped
> -----------------------------------
>
>                 Key: SPARK-30346
>                 URL: https://issues.apache.org/jira/browse/SPARK-30346
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0, 2.3.2, 3.0.0
>            Reporter: liupengcheng
>            Assignee: liupengcheng
>            Priority: Major
>
> Currently, spark will logging events dropped count info every 60s when events 
> dropped, however, we notice that this not working as expected in our 
> production environment.
> We looked into the code, and find out that the concurrent update of  
> `droppedEventsCounter` may cause the logging logic skipped and delayed a very 
> long time.
>  
>  
> {code:java}
> def post(event: SparkListenerEvent): Unit = {
>   if (stopped.get()) {
>     return
>   }
>   eventCount.incrementAndGet()
>   if (eventQueue.offer(event)) {
>     return
>   }
>   eventCount.decrementAndGet()
>   droppedEvents.inc()
>   droppedEventsCounter.incrementAndGet()
>   if (logDroppedEvent.compareAndSet(false, true)) {
>     // Only log the following message once to avoid duplicated annoying logs.
>     logError(s"Dropping event from queue $name. " +
>       "This likely means one of the listeners is too slow and cannot keep up 
> with " +
>       "the rate at which tasks are being started by the scheduler.")
>   }
>   logTrace(s"Dropping event $event")
>   val droppedCount = droppedEventsCounter.get
>   if (droppedCount > 0) {
>     // Don't log too frequently
>     if (System.currentTimeMillis() - lastReportTimestamp >= 60 * 1000) {
>       // There may be multiple threads trying to decrease 
> droppedEventsCounter.
>       // Use "compareAndSet" to make sure only one thread can win.
>       // And if another thread is increasing droppedEventsCounter, 
> "compareAndSet" will fail and
>       // then that thread will update it.
>       if (droppedEventsCounter.compareAndSet(droppedCount, 0)) {
>         val prevLastReportTimestamp = lastReportTimestamp
>         lastReportTimestamp = System.currentTimeMillis()
>         val previous = new java.util.Date(prevLastReportTimestamp)
>         logWarning(s"Dropped $droppedCount events from $name since " +
>           s"${if (prevLastReportTimestamp == 0) "the application started" 
> else s"$previous"}.")
>       }
>     }
>   }
> }
> {code}
> What's more, I think we can improve this logic here to also logging the 
> thread dump of dispatcher thread, which can do great help to debugging 
> performance issues may cause the events dropped.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to