[ https://issues.apache.org/jira/browse/SPARK-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164191#comment-14164191 ]
Cody Koeninger commented on SPARK-3851: --------------------------------------- So I have a couple of questions 1. Does it make most sense for this to be an optional argument to unionAll, or a separate method that produces the equivalent of unionAll, just with compatible schema? 2. What order should columns be resolved in? Given V1(a: String, b: Int, c: Option[Int]) V2(a: String, b: Long, d: Option[String]) rddV1.mergeSchema(rddV2) should result in an rdd with all rows and a schema like (a: String, b: Long, c: Option[Int], d: Option[String]) where all the b's for v1 rows are upcast to long, all of the d's for v1 rows are null, and all of the c's for v2 rows are null. Does that make sense? > Support for reading parquet files with different but compatible schema > ---------------------------------------------------------------------- > > Key: SPARK-3851 > URL: https://issues.apache.org/jira/browse/SPARK-3851 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Michael Armbrust > > Right now it is required that all of the parquet files have the same schema. > It would be nice to support some safe subset of cases where the schemas of > files is different. For example: > - Adding and removing nullable columns. > - Widening types (a column that is of both Int and Long type) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org