koert kuipers created SPARK-19536:
-------------------------------------

             Summary: Improve capability to merge SQL data types
                 Key: SPARK-19536
                 URL: https://issues.apache.org/jira/browse/SPARK-19536
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: koert kuipers
            Priority: Minor


spark's union/merging of compatible types seems kind of weak. it works on basic 
types in the top level record, but it fails for nested records, maps, arrays, 
etc.

i would like to improve this.

for example i get errors like this:
{noformat}
org.apache.spark.sql.AnalysisException: Union can only be performed on tables 
with the compatible column types. StructType(StructField(_1,StringType,true), 
StructField(_2,IntegerType,false)) <> 
StructType(StructField(_1,StringType,true), StructField(_2,LongType,false)) at 
the first column of the second table
{noformat}
some examples that do work:
{noformat}
scala> Seq(1, 2, 3).toDF union Seq(1L, 2L, 3L).toDF
res2: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: bigint]

scala> Seq((1,"x"), (2,"x"), (3,"x")).toDF union Seq((1L,"x"), (2L,"x"), 
(3L,"x")).toDF
res3: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: bigint, _2: 
string]
{noformat}
what i would also expect to work but currently doesn't:
{noformat}
scala> Seq((Seq(1),"x"), (Seq(2),"x"), (Seq(3),"x")).toDF union 
Seq((Seq(1L),"x"), (Seq(2L),"x"), (Seq(3L),"x")).toDF

scala> Seq((1,("x",1)), (2,("x",2)), (3,("x",3))).toDF union Seq((1L,("x",1L)), 
(2L,("x",2L)), (3L,("x", 3L))).toDF
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to