Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22275 Apologies for the delay in circling back to this. I reorganized a little to simplify and expanded the comments to hopefully better describe the code. A quick summary of the changes: I changed the ArrowStreamSerializer to not have any state - that seemed to complicate things. So instead of saving the batch order indices, they are loaded on the last iteration of `load_stream`, and this was put in a special serializer `ArrowCollectSerializer` so that it is clear where it is used. I also consolidated all the batch ordering calls within `_collectAsArrow` so it is easier to follow the whole process.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org