I also want to add that generally these may be caused by the `nullability` field in the schema.
On Mon, May 8, 2017 at 12:25 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com > wrote: > This is because RDD.union doesn't check the schema, so you won't see the > problem unless you run RDD and hit the incompatible column problem. For > RDD, You may not see any error if you don't use the incompatible column. > > Dataset.union requires compatible schema. You can print ds.schema and > ds1.schema and check if they are same. > > On Mon, May 8, 2017 at 11:07 AM, Dirceu Semighini Filho < > dirceu.semigh...@gmail.com> wrote: > >> Hello, >> I've a very complex case class structure, with a lot of fields. >> When I try to union two datasets of this class, it doesn't work with the >> following error : >> ds.union(ds1) >> Exception in thread "main" org.apache.spark.sql.AnalysisException: Union >> can only be performed on tables with the compatible column types >> >> But when use it's rdd, the union goes right: >> ds.rdd.union(ds1.rdd) >> res8: org.apache.spark.rdd.RDD[ >> >> Is there any reason for this to happen (besides a bug ;) ) >> >> >> >