[ 
https://issues.apache.org/jira/browse/SPARK-24150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Montaz updated SPARK-24150:
-----------------------------------
    Description: 
There exist a race condition in checkLogs method between threads of 
replayExecutor. They use the field "applications" to synchronise, but they also 
update that field.

The problem is that if the number of tasks (the number of new log files to 
replay and add to the applications list) is greater than the number of threads 
in the pool, there is a great chance that a thread will try to synchronise on 
an updated version of applications (since it is volatile and updated) while 
some are still being synchronised on an old reference of applications. There 
the race condition happens.

Workaround:
 * use a permanent object as a monitor on which to synchronise (or synchronise 
on `this`)
 * keep volatile field for all other read accesses

  was:
There exist a race condition between the method checkLogs and cleanLogs.

cleanLogs can read the field applications while it is concurrently processed by 
checkLogs. It is possible that checkLogs added new fetched logs, sets 
applications and this is erased by cleanLogs having an old version of 
applications. The problem is that the fetched log won't appear in applications 
anymore and it will then be impossible to display the corresponding application 
in the History Server, since it must be in the LinkedList applications. 

Workaround:
 * use a permanent object as a monitor on which to synchronise
 * keep volatile field for all other read accesses


> Race condition in FsHistoryProvider
> -----------------------------------
>
>                 Key: SPARK-24150
>                 URL: https://issues.apache.org/jira/browse/SPARK-24150
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: William Montaz
>            Priority: Minor
>
> There exist a race condition in checkLogs method between threads of 
> replayExecutor. They use the field "applications" to synchronise, but they 
> also update that field.
> The problem is that if the number of tasks (the number of new log files to 
> replay and add to the applications list) is greater than the number of 
> threads in the pool, there is a great chance that a thread will try to 
> synchronise on an updated version of applications (since it is volatile and 
> updated) while some are still being synchronised on an old reference of 
> applications. There the race condition happens.
> Workaround:
>  * use a permanent object as a monitor on which to synchronise (or 
> synchronise on `this`)
>  * keep volatile field for all other read accesses



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to