Re: [2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Sun Rui
import org.apache.spark.sql.catalyst.encoders.RowEncoder
implicit val encoder = RowEncoder(df.schema)
df.mapPartitions(_.take(1))

> On Aug 3, 2016, at 04:55, Dragisa Krsmanovic  wrote:
> 
> I am trying to use mapPartitions on DataFrame.
> 
> Example:
> 
> import spark.implicits._
> val df: DataFrame = Seq((1,"one"), (2, "two")).toDF("id", "name")
> df.mapPartitions(_.take(1))
> 
> I am getting:
> 
> Unable to find encoder for type stored in a Dataset.  Primitive types (Int, 
> String, etc) and Product types (case classes) are supported by importing 
> spark.implicits._  Support for serializing other types will be added in 
> future releases.
> 
> Since DataFrame is Dataset[Row], I was expecting encoder for Row to be there.
> 
> What's wrong with my code ?
>  
> 
> -- 
> Dragiša Krsmanović | Platform Engineer | Ticketfly
> 
> dragi...@ticketfly.com 
> @ticketfly  | ticketfly.com/blog 
>  | facebook.com/ticketfly 
> 


Re: [2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Dragisa Krsmanovic
You are converting DataFrame to Dataset[Entry].

DataFrame is Dataset[Row].

mapPertitions works fine with simple Dataset. Just not with DataFrame.



On Tue, Aug 2, 2016 at 4:50 PM, Ted Yu  wrote:

> Using spark-shell of master branch:
>
> scala> case class Entry(id: Integer, name: String)
> defined class Entry
>
> scala> val df  = Seq((1,"one"), (2, "two")).toDF("id", "name").as[Entry]
> 16/08/02 16:47:01 DEBUG package$ExpressionCanonicalizer:
> === Result of Batch CleanExpressions ===
> !assertnotnull(input[0, scala.Tuple2, true], top level non-flat input
> object)._1 AS _1#10   assertnotnull(input[0, scala.Tuple2, true], top level
> non-flat input object)._1
> !+- assertnotnull(input[0, scala.Tuple2, true], top level non-flat input
> object)._1 +- assertnotnull(input[0, scala.Tuple2, true], top level
> non-flat input object)
> !   +- assertnotnull(input[0, scala.Tuple2, true], top level non-flat
> input object)+- input[0, scala.Tuple2, true]
> !  +- input[0, scala.Tuple2, true]
> ...
>
> scala> df.mapPartitions(_.take(1))
>
> On Tue, Aug 2, 2016 at 1:55 PM, Dragisa Krsmanovic  > wrote:
>
>> I am trying to use mapPartitions on DataFrame.
>>
>> Example:
>>
>> import spark.implicits._
>> val df: DataFrame = Seq((1,"one"), (2, "two")).toDF("id", "name")
>> df.mapPartitions(_.take(1))
>>
>> I am getting:
>>
>> Unable to find encoder for type stored in a Dataset.  Primitive types
>> (Int, String, etc) and Product types (case classes) are supported by
>> importing spark.implicits._  Support for serializing other types will be
>> added in future releases.
>>
>> Since DataFrame is Dataset[Row], I was expecting encoder for Row to be
>> there.
>>
>> What's wrong with my code ?
>>
>>
>> --
>>
>> Dragiša Krsmanović | Platform Engineer | Ticketfly
>>
>> dragi...@ticketfly.com
>>
>> @ticketfly  | ticketfly.com/blog |
>> facebook.com/ticketfly
>>
>
>


-- 

Dragiša Krsmanović | Platform Engineer | Ticketfly

dragi...@ticketfly.com

@ticketfly  | ticketfly.com/blog |
facebook.com/ticketfly


Re: [2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Ted Yu
Using spark-shell of master branch:

scala> case class Entry(id: Integer, name: String)
defined class Entry

scala> val df  = Seq((1,"one"), (2, "two")).toDF("id", "name").as[Entry]
16/08/02 16:47:01 DEBUG package$ExpressionCanonicalizer:
=== Result of Batch CleanExpressions ===
!assertnotnull(input[0, scala.Tuple2, true], top level non-flat input
object)._1 AS _1#10   assertnotnull(input[0, scala.Tuple2, true], top level
non-flat input object)._1
!+- assertnotnull(input[0, scala.Tuple2, true], top level non-flat input
object)._1 +- assertnotnull(input[0, scala.Tuple2, true], top level
non-flat input object)
!   +- assertnotnull(input[0, scala.Tuple2, true], top level non-flat input
object)+- input[0, scala.Tuple2, true]
!  +- input[0, scala.Tuple2, true]
...

scala> df.mapPartitions(_.take(1))

On Tue, Aug 2, 2016 at 1:55 PM, Dragisa Krsmanovic 
wrote:

> I am trying to use mapPartitions on DataFrame.
>
> Example:
>
> import spark.implicits._
> val df: DataFrame = Seq((1,"one"), (2, "two")).toDF("id", "name")
> df.mapPartitions(_.take(1))
>
> I am getting:
>
> Unable to find encoder for type stored in a Dataset.  Primitive types
> (Int, String, etc) and Product types (case classes) are supported by
> importing spark.implicits._  Support for serializing other types will be
> added in future releases.
>
> Since DataFrame is Dataset[Row], I was expecting encoder for Row to be
> there.
>
> What's wrong with my code ?
>
>
> --
>
> Dragiša Krsmanović | Platform Engineer | Ticketfly
>
> dragi...@ticketfly.com
>
> @ticketfly  | ticketfly.com/blog |
> facebook.com/ticketfly
>