[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

BryanCutler Thu, 01 Dec 2016 15:37:25 -0800

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/15821
  
    Hi @wesm and @icexelloss , that sounds good on our end.  @yinxusen has been 
working on validating some basic conversion so far, but everything is still 
very preliminary so it would be great to work with you guys.  I'll setup a new 
integration branch and ping you all when ready.
    
    > Related to this we'll also want to be able to precisely instrument and 
benchmark the Dataset <-> Arrow conversion -- @icexelloss suggested might be 
able to push down the conversion into the executors instead of doing all the 
work in the driver, but I'm not sure how feasible that is
    
    We were thinking about that too, as it would be more ideal.  For simplicity 
we decided to first do the conversion on the driver side, which should 
hopefully still show a performance increase, then follow up with some work to 
better optimize it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

Reply via email to