Re: Redundant common columns of nature full outer join

2016-01-20 Thread Michael Armbrust
If you use the join that takes USING columns it should automatically
coalesce (take the non null value from) the left/right columns:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L405

On Tue, Jan 19, 2016 at 10:51 PM, Zhong Wang 
wrote:

> Hi all,
>
> I am joining two tables with common columns using full outer join.
> However, the current Dataframe API doesn't support nature joins, so the
> output contains redundant common columns from both of the tables.
>
> Is there any way to remove these redundant columns for a "nature" full
> outer join? For a left outer join or right outer join, I can select just
> the common columns from the left table or the right table. However, for a
> full outer join, it seems it is quite difficult to do that, because there
> are null values in both of the left and right common columns.
>
>
> Thanks,
> Zhong
>


Redundant common columns of nature full outer join

2016-01-19 Thread Zhong Wang
Hi all,

I am joining two tables with common columns using full outer join. However,
the current Dataframe API doesn't support nature joins, so the output
contains redundant common columns from both of the tables.

Is there any way to remove these redundant columns for a "nature" full
outer join? For a left outer join or right outer join, I can select just
the common columns from the left table or the right table. However, for a
full outer join, it seems it is quite difficult to do that, because there
are null values in both of the left and right common columns.


Thanks,
Zhong