Igor Suhorukov created SPARK-40325: -------------------------------------- Summary: Support of Columnar result(ColumnarBatch) in org.apache.spark.sql.Dataset flatMap, transform, etc Key: SPARK-40325 URL: https://issues.apache.org/jira/browse/SPARK-40325 Project: Spark Issue Type: New Feature Components: Java API, Spark Core Affects Versions: 3.3.0 Reporter: Igor Suhorukov
Sometimes result of data transformation in JVM program available from native code in Apache Arrow columnar data format. Current Dataset API require unnecessary data transform from columnar format wrapper into row with additional allocation on JVM heap. In this proposed feature I ask for propagation of columnar data in DatasetAPI without unnecessary InternalRow->Row->InternalRow conversion. Current solution use [ColumnarBatch wrapper|https://github.com/igor-suhorukov/spark3/blob/master/src/main/java/com/github/igorsuhorukov/arrow/spark/ArrowDataIterator.java] on top of ArrowColumnVector and rowExpressionEncoder.createDeserializer() to transform data [into Row|https://github.com/igor-suhorukov/spark3/blob/c655d4b6058fdd4529aa59093edfe2333d96fb05/src/main/java/com/github/igorsuhorukov/arrow/spark/ArrowDataIterator.java#L53] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org