[ https://issues.apache.org/jira/browse/SPARK-25175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595994#comment-16595994 ]
Apache Spark commented on SPARK-25175: -------------------------------------- User 'seancxmao' has created a pull request for this issue: https://github.com/apache/spark/pull/22262 > Field resolution should fail if there's ambiguity for ORC native reader > ----------------------------------------------------------------------- > > Key: SPARK-25175 > URL: https://issues.apache.org/jira/browse/SPARK-25175 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.1 > Reporter: Chenxiao Mao > Priority: Major > > SPARK-25132 adds support for case-insensitive field resolution when reading > from Parquet files. We found ORC files have similar issues, but not identical > to Parquet. Spark has two OrcFileFormat. > * Since SPARK-2883, Spark supports ORC inside sql/hive module with Hive > dependency. This hive OrcFileFormat always do case-insensitive field > resolution regardless of case sensitivity mode. When there is ambiguity, hive > OrcFileFormat always returns the first matched field, rather than failing the > reading operation. > * SPARK-20682 adds a new ORC data source inside sql/core. This native > OrcFileFormat supports case-insensitive field resolution, however it cannot > handle duplicate fields. > Besides data source tables, hive serde tables also have issues. If ORC data > file has more fields than table schema, we just can't read hive serde tables. > If ORC data file does not have more fields, hive serde tables always do field > resolution by ordinal, rather than by name. > Both ORC data source hive impl and hive serde table rely on the hive orc > InputFormat/SerDe to read table. I'm not sure whether we can change > underlying hive classes to make all orc read behaviors consistent. > This ticket aims to make read behavior of ORC data source native impl > consistent with Parquet data source. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org