Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10024#issuecomment-211480633
  
    @lianhuiwang Thanks for working on this, I think it's in the good 
direction. Two things left:
    
    1) thread safety. For example, you will have two threads for PythonRDD 
(same for RRDD), one iterate rows from parent RDD, another iterator rows from 
PythonRDD/RRDD, the second one could trigger spilling, the spilling happen in 
second thread, and the first thread could consuming the same iterator. So must 
make them thread safe. This is the hardest part, you could take the SQL 
operators as examples.
    
    2) Adding more tests. As @squito suggested, more comments to explain the 
high level ideas will be good to have.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to