[
https://issues.apache.org/jira/browse/SPARK-34863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446248#comment-17446248
]
Apache Spark commented on SPARK-34863:
--------------------------------------
User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34659
> Support nested column in Spark Parquet vectorized readers
> ---------------------------------------------------------
>
> Key: SPARK-34863
> URL: https://issues.apache.org/jira/browse/SPARK-34863
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: Cheng Su
> Assignee: Apache Spark
> Priority: Minor
>
> The task is to support nested column type in Spark Parquet vectorized reader.
> Currently Parquet vectorized reader does not support nested column type
> (struct, array and map). We implemented nested column vectorized reader for
> FB-ORC in our internal fork of Spark. We are seeing performance improvement
> compared to non-vectorized reader when reading nested columns. In addition,
> this can also help improve the non-nested column performance when reading
> non-nested and nested columns together in one query.
>
> Parquet:
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L173]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]