Re: type-safe join in the new DataSet API?

2016-11-26 Thread Koert Kuipers
although this is correct, KeyValueGroupedDataset.coGroup requires one to
implement their own join logic with Iterator functions. its fun to do that,
and i appreciate the flexibility it gives, but i would not consider it a
good solution for someone that just wants to do a typed join

On Thu, Nov 10, 2016 at 2:18 PM, Michael Armbrust 
wrote:

> You can groupByKey and then cogroup.
>
> On Thu, Nov 10, 2016 at 10:44 AM, Yang  wrote:
>
>> the new DataSet API is supposed to provide type safety and type checks at
>> compile time https://spark.apache.org/docs/latest/structured-streami
>> ng-programming-guide.html#join-operations
>>
>> It does this indeed for a lot of places, but I found it still doesn't
>> have a type safe join:
>>
>> val ds1 = hc.sql("select col1, col2 from mytable")
>>
>> val ds2 = hc.sql("select col3 , col4 from mytable2")
>>
>> val ds3 = ds1.joinWith(ds2, ds1.col("col1") === ds2.col("col2"))
>>
>> here spark has no way to make sure (at compile time) that the two columns
>> being joined together
>> , "col1" and "col2" are of matching types. This is contrast to rdd join,
>> where it would be detected at compile time.
>>
>> am I missing something?
>>
>> thanks
>>
>>
>


Re: type-safe join in the new DataSet API?

2016-11-10 Thread Michael Armbrust
You can groupByKey and then cogroup.

On Thu, Nov 10, 2016 at 10:44 AM, Yang  wrote:

> the new DataSet API is supposed to provide type safety and type checks at
> compile time https://spark.apache.org/docs/latest/structured-
> streaming-programming-guide.html#join-operations
>
> It does this indeed for a lot of places, but I found it still doesn't have
> a type safe join:
>
> val ds1 = hc.sql("select col1, col2 from mytable")
>
> val ds2 = hc.sql("select col3 , col4 from mytable2")
>
> val ds3 = ds1.joinWith(ds2, ds1.col("col1") === ds2.col("col2"))
>
> here spark has no way to make sure (at compile time) that the two columns
> being joined together
> , "col1" and "col2" are of matching types. This is contrast to rdd join,
> where it would be detected at compile time.
>
> am I missing something?
>
> thanks
>
>