[jira] [Commented] (SPARK-27913) Spark SQL's native ORC reader implements its own schema evolution

Giri (Jira) Tue, 11 Feb 2020 13:47:26 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-27913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034875#comment-17034875
 ]


Giri commented on SPARK-27913:
------------------------------

This issue doesn't exist in  *spark spark-3.0.0-preview2 and also in spark 2.3* 
  Will this fix be ported to 2.4.x branch?

 

It appears that issue is realated to spark not using the schema from the 
metastore but from the ORC files and this causes the schema mismatch and out of 
bound exception when  OrcDeserializer accesses the field that doesn't exist in 
the file.

 

I see logs like this:

 

20/02/11 14:30:38 INFO RecordReaderImpl: Reader schema not provided -- using 
file schema struct<ar:struct<f1:int>>
20/02/11 14:30:38 INFO RecordReaderImpl: Reader schema not provided -- using 
file schema struct<a:struct<f1:int,f2:int>>

 

> Spark SQL's native ORC reader implements its own schema evolution
> -----------------------------------------------------------------
>
>                 Key: SPARK-27913
>                 URL: https://issues.apache.org/jira/browse/SPARK-27913
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.3
>            Reporter: Owen O'Malley
>            Priority: Major
>
> ORC's reader handles a wide range of schema evolution, but the Spark SQL 
> native ORC bindings do not provide the desired schema to the ORC reader. This 
> causes a regression when moving spark.sql.orc.impl from 'hive' to 'native'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27913) Spark SQL's native ORC reader implements its own schema evolution

Reply via email to