[ https://issues.apache.org/jira/browse/SPARK-37124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434624#comment-17434624 ]
Apache Spark commented on SPARK-37124: -------------------------------------- User 'xuechendi' has created a pull request for this issue: https://github.com/apache/spark/pull/34396 > Support Writable ArrowColumnarVector > ------------------------------------ > > Key: SPARK-37124 > URL: https://issues.apache.org/jira/browse/SPARK-37124 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.2.0 > Reporter: Chendi.Xue > Priority: Major > > This Jira is aim to add Arrow format as an alternative for ColumnVector > solution. > Current ArrowColumnVector is not fully equivalent to > OnHeap/OffHeapColumnVector in spark, and since Arrow API is now being more > stable, and using pandas udf will perform much better than python udf. > I amĀ proposing to fully support arrow format as an alternative to > ColumnVector just like the other two. > What I did in this PR is to create a new class in the same package with > OnHeap/OffHeapColumnVector and extend from WritableColumnVector to support > all put APIs. > UTs are covering all Data Format with testing on writing to columnVector and > reading from columnVector. I also added 3 UTs for testing on loading from > ArrowRecordBatch and allocateColumns . -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org