Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

Dirceu Semighini Filho Mon, 08 May 2017 12:47:35 -0700

HI Burak,
By nullability you mean that if I have the exactly the same schema, but one
side support null and the other doesn't, this exception (in union dataset)
will be thrown?




2017-05-08 16:41 GMT-03:00 Burak Yavuz <brk...@gmail.com>:

> I also want to add that generally these may be caused by the `nullability`
> field in the schema.
>
> On Mon, May 8, 2017 at 12:25 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> This is because RDD.union doesn't check the schema, so you won't see the
>> problem unless you run RDD and hit the incompatible column problem. For
>> RDD, You may not see any error if you don't use the incompatible column.
>>
>> Dataset.union requires compatible schema. You can print ds.schema and
>> ds1.schema and check if they are same.
>>
>> On Mon, May 8, 2017 at 11:07 AM, Dirceu Semighini Filho <
>> dirceu.semigh...@gmail.com> wrote:
>>
>>> Hello,
>>> I've a very complex case class structure, with a lot of fields.
>>> When I try to union two datasets of this class, it doesn't work with the
>>> following error :
>>> ds.union(ds1)
>>> Exception in thread "main" org.apache.spark.sql.AnalysisException:
>>> Union can only be performed on tables with the compatible column types
>>>
>>> But when use it's rdd, the union goes right:
>>> ds.rdd.union(ds1.rdd)
>>> res8: org.apache.spark.rdd.RDD[
>>>
>>> Is there any reason for this to happen (besides a bug ;) )
>>>
>>>
>>>
>>
>

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

Reply via email to