[ https://issues.apache.org/jira/browse/SPARK-37864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472563#comment-17472563 ]
Apache Spark commented on SPARK-37864: -------------------------------------- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35163 > Support Parquet v2 data page RLE encoding (for Boolean Values) for the > vectorized path > -------------------------------------------------------------------------------------- > > Key: SPARK-37864 > URL: https://issues.apache.org/jira/browse/SPARK-37864 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.3.0 > Reporter: Yang Jie > Priority: Major > > Parquet v2 data page write Boolean Values use RLE encoding, when read v2 > boolean type values it will throw exceptions as follows now: > > {code:java} > Caused by: java.lang.UnsupportedOperationException: Unsupported encoding: RLE > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.getValuesReader(VectorizedColumnReader.java:305) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.initDataReader(VectorizedColumnReader.java:277) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readPageV2(VectorizedColumnReader.java:344) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.access$100(VectorizedColumnReader.java:48) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader$1.visit(VectorizedColumnReader.java:250) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader$1.visit(VectorizedColumnReader.java:237) > ~[classes/:?] > at org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:192) > ~[parquet-column-1.12.2.jar:1.12.2] > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readPage(VectorizedColumnReader.java:237) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:173) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:311) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:209) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116) > ~[classes/:?] > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:298) > ~[classes/:?] > ... 19 more {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org