oh the map in DataFrame is actually using a RowEncoder. i left it out
because it wasn't important:

so this doesn't compile:

def f[T]: Dataset[T] => Dataset[T] = dataset => {
  val df = dataset.toDF
  df.map(row => row)(RowEncoder(df.schema)).as[T]
}


now this does compile. but i don't like it, since the assumption of an
implicit encoder isnt always true, and i dont want to start passing
encoders around when they are already embedded in the datasets:

def f[T: Encoder]: Dataset[T] => Dataset[T] = dataset => {
  val df = dataset.toDF
  df.map(row => row)(RowEncoder(df.schema)).as[T]
}

On Thu, Jan 26, 2017 at 10:50 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi Koert,
>
> map will take the value that has an implicit Encoder to any value that
> may or may not have an encoder in scope. That's why I'm asking about
> the map function to see what it does.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Jan 26, 2017 at 4:18 PM, Koert Kuipers <ko...@tresata.com> wrote:
> > the map operation works on DataFrame so it doesn't need an encoder. It
> could
> > have been any operation on DataFrame. the issue is at the end going back
> to
> > Dataset[T] using as[T]. this requires an encoder for T which i know i
> > already have since i started with a Dataset[T].
> >
> > i could add an implicit encoder but that assumes T has an implicit
> encoder
> > which isn't always true. for example i could be using a kryo encoder. but
> > anyhow i shouldn't have to be guessing this Encoder[T] since it's part
> of my
> > dataset already
> >
> > On Jan 26, 2017 05:18, "Jacek Laskowski" <ja...@japila.pl> wrote:
> >
> > Hi,
> >
> > Can you show the code from map to reproduce the issue? You can create
> > encoders using Encoders object (I'm using it all over the place for
> schema
> > generation).
> >
> > Jacek
> >
> > On 25 Jan 2017 10:19 p.m., "Koert Kuipers" <ko...@tresata.com> wrote:
> >>
> >> i often run into problems like this:
> >>
> >> i need to write a Dataset[T] => Dataset[T], and inside i need to switch
> to
> >> DataFrame for a particular operation.
> >>
> >> but if i do:
> >> dataset.toDF.map(...).as[T] i get error:
> >> Unable to find encoder for type stored in a Dataset.
> >>
> >> i know it has an encoder, because i started with Dataset[T]
> >>
> >> so i would like to do:
> >> dataset.toDF.map(...).as[T](dataset.encoder)
> >>
> >
>

Reply via email to