[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

BryanCutler Tue, 30 Oct 2018 16:30:01 -0700

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/22275
  
    Apologies for the delay in circling back to this. I reorganized a little to 
simplify and expanded the comments to hopefully better describe the code.
    
    A quick summary of the changes: I changed the ArrowStreamSerializer to not 
have any state  - that seemed to complicate things. So instead of saving the 
batch order indices, they are loaded on the last iteration of `load_stream`, 
and this was put in a special serializer `ArrowCollectSerializer` so that it is 
clear where it is used.  I also consolidated all the batch ordering calls 
within `_collectAsArrow` so it is easier to follow the whole process.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

Reply via email to