That's neat On Jun 7, 2016 4:34 PM, "Jacek Laskowski" <ja...@japila.pl> wrote:
> Hi, > > What about this? > > scala> final case class Person(name: String, age: Int) > warning: there was one unchecked warning; re-run with -unchecked for > details > defined class Person > > scala> val ds = Seq(Person("foo", 42), Person("bar", 24)).toDS > ds: org.apache.spark.sql.Dataset[Person] = [name: string, age: int] > > scala> ds.as("a").joinWith(ds.as("b"), $"a.name" === $"b.name > ").show(false) > +--------+--------+ > |_1 |_2 | > +--------+--------+ > |[foo,42]|[foo,42]| > |[bar,24]|[bar,24]| > +--------+--------+ > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Tue, Jun 7, 2016 at 9:30 PM, Koert Kuipers <ko...@tresata.com> wrote: > > for some operators on Dataset, like joinWith, one needs to use an > expression > > which means referring to columns by name. > > > > how can i set the column names for a Dataset before doing a joinWith? > > > > currently i am aware of: > > df.toDF("k", "v").as[(K, V)] > > > > but that seems inefficient/anti-pattern? i shouldn't have to go to a > > DataFrame and back to set the column names? > > > > or if this is the only way to set names, and column names really > shouldn't > > be used in Datasets, can i perhaps refer to the columns by their > position? > > > > thanks, koert >