Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 Updated to work with the latest Arrow to prepare for 0.3 release (tests should fail because that artifact is not yet available). Also improved consistency of ArrowConverters and did some cleanup. From @rxin 's comments: > Move ArrowConverters.scala somewhere else that's not top level, e.g. execution.arrow It is now in the o.a.s.sql.execution.arrow package > Update this to arrow 0.3 Ready to do this, should just need to update the pom again >Use SQLConf rather than a parameter for toPandas. I removed this flag and used the conf "spark.sql.execution.arrow.enable" which defaults to "false" >Handle failure gracefully if arrow is not installed (or somehow package it with Spark?) It would be difficult to package with Spark, I think, because pyarrow also depends on the native Arrow cpp library. I changed it to fail gracefully if pyarrow is not available. The error message is: ``` ImportError: No module named pyarrow note: pyarrow must be installed and available on calling Python processif using spark.sql.execution.arrow.enable=true ``` >How are the memory managed? Who allocates the memory for the arrow records, and who's responsible for releasing them? The Java side of Arrow requires using a BufferAllocator class that manages the allocated memory. An instance of this must be used each time a ArrowRecordBatch is created and then the batch and allocator must be released/closed after they have been processed. This is all handled in the `ArrowConverter` functions. On the Python side, buffers are allocated from the Arrow cpp library and cleaned up when reference counts to the objects are zero. The end user does not have to worry about managing any memory.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org