Github user davies commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-211480633 @lianhuiwang Thanks for working on this, I think it's in the good direction. Two things left: 1) thread safety. For example, you will have two threads for PythonRDD (same for RRDD), one iterate rows from parent RDD, another iterator rows from PythonRDD/RRDD, the second one could trigger spilling, the spilling happen in second thread, and the first thread could consuming the same iterator. So must make them thread safe. This is the hardest part, you could take the SQL operators as examples. 2) Adding more tests. As @squito suggested, more comments to explain the high level ideas will be good to have.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org