That's neat
On Jun 7, 2016 4:34 PM, "Jacek Laskowski" <ja...@japila.pl> wrote:

> Hi,
>
> What about this?
>
> scala> final case class Person(name: String, age: Int)
> warning: there was one unchecked warning; re-run with -unchecked for
> details
> defined class Person
>
> scala> val ds = Seq(Person("foo", 42), Person("bar", 24)).toDS
> ds: org.apache.spark.sql.Dataset[Person] = [name: string, age: int]
>
> scala> ds.as("a").joinWith(ds.as("b"), $"a.name" === $"b.name
> ").show(false)
> +--------+--------+
> |_1      |_2      |
> +--------+--------+
> |[foo,42]|[foo,42]|
> |[bar,24]|[bar,24]|
> +--------+--------+
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Jun 7, 2016 at 9:30 PM, Koert Kuipers <ko...@tresata.com> wrote:
> > for some operators on Dataset, like joinWith, one needs to use an
> expression
> > which means referring to columns by name.
> >
> > how can i set the column names for a Dataset before doing a joinWith?
> >
> > currently i am aware of:
> > df.toDF("k", "v").as[(K, V)]
> >
> > but that seems inefficient/anti-pattern? i shouldn't have to go to a
> > DataFrame and back to set the column names?
> >
> > or if this is the only way to set names, and column names really
> shouldn't
> > be used in Datasets, can i perhaps refer to the columns by their
> position?
> >
> > thanks, koert
>

Reply via email to