GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/20223

    [SPARK-23020][core] Fix races in launcher code, test.

    The race in the code is because the handle might update
    its state to the wrong state if the connection handling
    thread is still processing incoming data; so the handle
    needs to wait for the connection to finish up before
    checking the final state.
    
    The race in the test is because when waiting for a handle
    to reach a final state, the waitFor() method needs to wait
    until all handle state is updated (which also includes
    waiting for the connection thread above to finish).
    Otherwise, waitFor() may return too early, which would cause
    a bunch of different races (like the listener not being yet
    notified of the state change, or being in the middle of
    being notified, or the handle not being properly disposed
    and causing postChecks() to assert).
    
    On top of that I found, by code inspection, a couple of
    potential races that could make a handle end up in the
    wrong state when being killed.
    
    Tested by running the existing unit tests a lot (and not
    seeing the errors I was seeing before).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-23020

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20223.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20223
    
----
commit 5139f605904996667d8d97941172bbe9d366a579
Author: Marcelo Vanzin <vanzin@...>
Date:   2018-01-10T17:56:17Z

    [SPARK-23020][core] Fix races in launcher code, test.
    
    The race in the code is because the handle might update
    its state to the wrong state if the connection handling
    thread is still processing incoming data; so the handle
    needs to wait for the connection to finish up before
    checking the final state.
    
    The race in the test is because when waiting for a handle
    to reach a final state, the waitFor() method needs to wait
    until all handle state is updated (which also includes
    waiting for the connection thread above to finish).
    Otherwise, waitFor() may return too early, which would cause
    a bunch of different races (like the listener not being yet
    notified of the state change, or being in the middle of
    being notified, or the handle not being properly disposed
    and causing postChecks() to assert).
    
    On top of that I found, by code inspection, a couple of
    potential races that could make a handle end up in the
    wrong state when being killed.
    
    Tested by running the existing unit tests a lot (and not
    seeing the errors I was seeing before).

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to