GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/15592

    [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance

    ## What changes were proposed in this pull request?
    
    The reason for the flakiness was follows. The test starts the maintenance 
background thread, and then writes 20 versions of the state store. The 
maintenance thread is expected to create snapshots in the middle, and clean up 
old files that are not needed any more. The earliest delta file (1.delta) is 
expected to be deleted as snapshots will ensure that the earliest delta would 
not be needed. 
    
    However, the default configuration for the maintenance thread is to retain 
files such that last 2 versions can be recovered, and delete the rest. Now 
while generating the versions, the maintenance thread can kick in and create 
snapshots anywhere between version 10 and 20 (at least 10 deltas needed for 
snapshot). Then later it will choose to retain only version 20 and 19 (last 2). 
There are two cases. 
    
    - Common case: One of the version between 10 and 19 gets snapshotted. Then 
recovering versions 19 and 20 just needs 19.snapshot and 20.delta, so 1.delta 
gets deleted.
    
    - Uncommon case (reason for flakiness): Only version 20 gets snapshotted. 
Then recovering versoin 20 requires 20.snapshot, and recovering version 19 all 
the previous 19...1.delta. So 1.delta does not get deleted.
    
    This PR rearranges the checks such that it create 20 versions, and then 
waits that there is at least one snapshot, then creates another 20. This will 
ensure that the latest 2 versions cannot require anything older than the first 
snapshot generated, and therefore will 1.delta will be deleted.
    
    In addition, I have added more logs, and comments that I felt would help 
future debugging and understanding what is going on.
    
    ## How was this patch tested?
    
    Ran the StateStoreSuite > 4K times in a heavily loaded machine (10 
instances of tests running in parallel). No failures.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-17624

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15592.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15592
    
----
commit 96079fef41aa0b8bf30ecb154e26d4d98e24be5b
Author: Tathagata Das <tathagata.das1...@gmail.com>
Date:   2016-10-21T22:27:21Z

    Fixed flaky test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to