[ 
https://issues.apache.org/jira/browse/SPARK-24150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Montaz updated SPARK-24150:
-----------------------------------
    Description: 
There exist a race condition in checkLogs method between threads of 
replayExecutor. They use the field "applications" to synchronise, but they also 
update that field.

The problem is that threads will eventually synchronise on different monitors 
(because they will synchronise on different objects which references that have 
been assigned to "applications"), breaking the initial synchronisation intent. 
This has even greater chance to reproduce when number_new_log_files > 
replayExecutor_pool_size

Workaround:
 * use a permanent object as a monitor on which to synchronise (or synchronise 
on `this`)
 * keep volatile field for all other read accesses

  was:
There exist a race condition in checkLogs method between threads of 
replayExecutor. They use the field "applications" to synchronise, but they also 
update that field.

The problem is that if the number of tasks (the number of new log files to 
replay and add to the applications list) is greater than the number of threads 
in the pool, threads will eventually synchronise on different monitors (because 
they will synchronise on different objects which references that have been 
assigned to "applications"), breaking the initial synchronisation intent.

Workaround:
 * use a permanent object as a monitor on which to synchronise (or synchronise 
on `this`)
 * keep volatile field for all other read accesses


> Race condition in FsHistoryProvider
> -----------------------------------
>
>                 Key: SPARK-24150
>                 URL: https://issues.apache.org/jira/browse/SPARK-24150
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: William Montaz
>            Priority: Minor
>
> There exist a race condition in checkLogs method between threads of 
> replayExecutor. They use the field "applications" to synchronise, but they 
> also update that field.
> The problem is that threads will eventually synchronise on different monitors 
> (because they will synchronise on different objects which references that 
> have been assigned to "applications"), breaking the initial synchronisation 
> intent. This has even greater chance to reproduce when number_new_log_files > 
> replayExecutor_pool_size
> Workaround:
>  * use a permanent object as a monitor on which to synchronise (or 
> synchronise on `this`)
>  * keep volatile field for all other read accesses



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to