although this is correct, KeyValueGroupedDataset.coGroup requires one to implement their own join logic with Iterator functions. its fun to do that, and i appreciate the flexibility it gives, but i would not consider it a good solution for someone that just wants to do a typed join
On Thu, Nov 10, 2016 at 2:18 PM, Michael Armbrust <mich...@databricks.com> wrote: > You can groupByKey and then cogroup. > > On Thu, Nov 10, 2016 at 10:44 AM, Yang <teddyyyy...@gmail.com> wrote: > >> the new DataSet API is supposed to provide type safety and type checks at >> compile time https://spark.apache.org/docs/latest/structured-streami >> ng-programming-guide.html#join-operations >> >> It does this indeed for a lot of places, but I found it still doesn't >> have a type safe join: >> >> val ds1 = hc.sql("select col1, col2 from mytable") >> >> val ds2 = hc.sql("select col3 , col4 from mytable2") >> >> val ds3 = ds1.joinWith(ds2, ds1.col("col1") === ds2.col("col2")) >> >> here spark has no way to make sure (at compile time) that the two columns >> being joined together >> , "col1" and "col2" are of matching types. This is contrast to rdd join, >> where it would be detected at compile time. >> >> am I missing something? >> >> thanks >> >> >