[ https://issues.apache.org/jira/browse/SPARK-27913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856592#comment-16856592 ]
Andrey Zinovyev commented on SPARK-27913: ----------------------------------------- Simple way to reproduce it {code:sql} create external table test_broken_orc(a struct<f1:int>) stored as orc; insert into table test_broken_orc select named_struct("f1", 1); drop table test_broken_orc; create external table test_broken_orc(a struct<f1:int, f2: int>) stored as orc; select * from test_broken_orc; {code} Last statement fails with exception {noformat} Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.orc.mapred.OrcStruct.getFieldValue(OrcStruct.java:49) at org.apache.spark.sql.execution.datasources.orc.OrcDeserializer$$anonfun$org$apache$spark$sql$execution$datasources$orc$OrcDeserializer$$newWriter$14.apply(OrcDeserializer.scala:133) at org.apache.spark.sql.execution.datasources.orc.OrcDeserializer$$anonfun$org$apache$spark$sql$execution$datasources$orc$OrcDeserializer$$newWriter$14.apply(OrcDeserializer.scala:123) at org.apache.spark.sql.execution.datasources.orc.OrcDeserializer$$anonfun$2$$anonfun$apply$1.apply(OrcDeserializer.scala:51) at org.apache.spark.sql.execution.datasources.orc.OrcDeserializer$$anonfun$2$$anonfun$apply$1.apply(OrcDeserializer.scala:51) at org.apache.spark.sql.execution.datasources.orc.OrcDeserializer.deserialize(OrcDeserializer.scala:64) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2$$anonfun$apply$7.apply(OrcFileFormat.scala:230) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2$$anonfun$apply$7.apply(OrcFileFormat.scala:230) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:104) {noformat} Also you can remove column or add column in the middle of struct field. As far as I understand current implementation it supports by-name field resolution of zero level of orc structure. Everything deeper get resolved by index and expected be exact match with reader schema > Spark SQL's native ORC reader implements its own schema evolution > ----------------------------------------------------------------- > > Key: SPARK-27913 > URL: https://issues.apache.org/jira/browse/SPARK-27913 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.3 > Reporter: Owen O'Malley > Priority: Major > > ORC's reader handles a wide range of schema evolution, but the Spark SQL > native ORC bindings do not provide the desired schema to the ORC reader. This > causes a regression when moving spark.sql.orc.impl from 'hive' to 'native'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org