GitHub user HyukjinKwon reopened a pull request:

    https://github.com/apache/spark/pull/18320

    [SPARK-21093][R] Terminate R's worker processes in the parent of R's daemon 
to prevent a leak

    ## What changes were proposed in this pull request?
    
    `mcfork` in R looks opening a pipe ahead but the existing logic does not 
properly close it when it is executed hot. This leads to the failure of more 
forking due to the limit for number of files open.
    
    This hot execution looks particularly for `gapply`/`gapplyCollect`. For 
unknown reason, this happens more easily in CentOS and could be reproduced in 
Mac too.
    
    All the details are described in 
https://issues.apache.org/jira/browse/SPARK-21093
    
    This PR proposes simply to terminate R's worker processes in the parent of 
R's daemon to prevent a leak.
    
    ## How was this patch tested?
    
    I ran the codes below on both CentOS and Mac with that configuration 
disabled/enabled.
    
    ```r
    df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
    collect(gapply(df, "a", function(key, x) { x }, schema(df)))
    collect(gapply(df, "a", function(key, x) { x }, schema(df)))
    ...  # 30 times
    ```
    
    Also, now it passes R tests on CentOS as below:
    
    ```
    SparkSQL functions: Spark package found in SPARK_HOME: .../spark
    
..............................................................................................................................................................
    
..............................................................................................................................................................
    
..............................................................................................................................................................
    
..............................................................................................................................................................
    
..............................................................................................................................................................
    
....................................................................................................................................
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-21093

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18320.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18320
    
----
commit 6e57ed2931afc5aec8c4b4bef72c157abcb68c46
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2017-06-16T02:37:53Z

    Terminates forked processed in the parent process

commit 4eadafe3f009b1c70956c08c99302c1da34db6d4
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2017-06-17T09:21:37Z

    Fix typo (renaming missed)

commit 18b3ee9a66df40658074511558f0cd36fc102df7
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2017-06-17T10:54:57Z

    Rename x to c in lapply

commit 72ab1f2e8cafa1d5249a09279825444e3ca38b39
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2017-06-19T12:21:08Z

    Update comments to describe the behaviour change

commit 6cba54c243123d25f479363d3dfd7eb92bb25599
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2017-06-20T01:02:26Z

    Do not check every second if there is no worker running

commit 4954008884ff02a9eae9ea50586e86e8923fc593
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2017-06-20T09:04:03Z

    Address comment

commit f3f57e46868e66b8f50268910c1eff494638059d
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2017-06-20T09:30:56Z

    Fix comments

commit 04bb37a6d8d4387365f6d46cb8e2c6fbe912351d
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2017-06-20T09:32:29Z

    Fix comments

commit d6f0ff275abd3b7641210427eea955c9f0ea8d86
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2017-06-20T09:39:21Z

    Add more comments

commit 8b48274dc565dc5c6722e983c55494b0067bda72
Author: Hyukjin Kwon <gurwls...@gmail.com>
Date:   2017-06-20T10:00:05Z

    Fix a typo

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to