[ 
https://issues.apache.org/jira/browse/SPARK-38536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506781#comment-17506781
 ] 

kalyan s commented on SPARK-38536:
----------------------------------

I guess the bug could be because, in 
[https://github.com/apache/spark/blob/branch-3.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L245,]
 when `createHadoopRDD` is called, it does not pass the details of the 
partition's input format, rather only passes the table's descriptor.

 

However, in Spark 2.4 code, the input format of the partition is passed(as the 
third argument). Ref: 
https://github.com/apache/spark/blob/branch-2.4/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L248

> Spark 3 can not read mixed format partitions
> --------------------------------------------
>
>                 Key: SPARK-38536
>                 URL: https://issues.apache.org/jira/browse/SPARK-38536
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.2.1
>            Reporter: Huicheng Song
>            Priority: Major
>
> Spark 3.x reads partitions with table's input format, which fails when the 
> partition has a different input format than the table.
> This is a regression introduced by SPARK-26630. Before that fix, Spark will 
> use Partition InputFormat when creating HadoopRDD. With that fix, Spark uses 
> only Table InputFormat when creating HadoopRDD, causing failures
> Reading mixed format partitions is an import scenario, especially for format 
> migration. It is also well supported in query engines like Hive and Presto.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to