Re: [2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Dragisa Krsmanovic
You are converting DataFrame to Dataset[Entry].

DataFrame is Dataset[Row].

mapPertitions works fine with simple Dataset. Just not with DataFrame.



On Tue, Aug 2, 2016 at 4:50 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Using spark-shell of master branch:
>
> scala> case class Entry(id: Integer, name: String)
> defined class Entry
>
> scala> val df  = Seq((1,"one"), (2, "two")).toDF("id", "name").as[Entry]
> 16/08/02 16:47:01 DEBUG package$ExpressionCanonicalizer:
> === Result of Batch CleanExpressions ===
> !assertnotnull(input[0, scala.Tuple2, true], top level non-flat input
> object)._1 AS _1#10   assertnotnull(input[0, scala.Tuple2, true], top level
> non-flat input object)._1
> !+- assertnotnull(input[0, scala.Tuple2, true], top level non-flat input
> object)._1 +- assertnotnull(input[0, scala.Tuple2, true], top level
> non-flat input object)
> !   +- assertnotnull(input[0, scala.Tuple2, true], top level non-flat
> input object)+- input[0, scala.Tuple2, true]
> !  +- input[0, scala.Tuple2, true]
> ...
>
> scala> df.mapPartitions(_.take(1))
>
> On Tue, Aug 2, 2016 at 1:55 PM, Dragisa Krsmanovic <dragi...@ticketfly.com
> > wrote:
>
>> I am trying to use mapPartitions on DataFrame.
>>
>> Example:
>>
>> import spark.implicits._
>> val df: DataFrame = Seq((1,"one"), (2, "two")).toDF("id", "name")
>> df.mapPartitions(_.take(1))
>>
>> I am getting:
>>
>> Unable to find encoder for type stored in a Dataset.  Primitive types
>> (Int, String, etc) and Product types (case classes) are supported by
>> importing spark.implicits._  Support for serializing other types will be
>> added in future releases.
>>
>> Since DataFrame is Dataset[Row], I was expecting encoder for Row to be
>> there.
>>
>> What's wrong with my code ?
>>
>>
>> --
>>
>> Dragiša Krsmanović | Platform Engineer | Ticketfly
>>
>> dragi...@ticketfly.com
>>
>> @ticketfly <https://twitter.com/ticketfly> | ticketfly.com/blog |
>> facebook.com/ticketfly
>>
>
>


-- 

Dragiša Krsmanović | Platform Engineer | Ticketfly

dragi...@ticketfly.com

@ticketfly <https://twitter.com/ticketfly> | ticketfly.com/blog |
facebook.com/ticketfly


[2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Dragisa Krsmanovic
I am trying to use mapPartitions on DataFrame.

Example:

import spark.implicits._
val df: DataFrame = Seq((1,"one"), (2, "two")).toDF("id", "name")
df.mapPartitions(_.take(1))

I am getting:

Unable to find encoder for type stored in a Dataset.  Primitive types (Int,
String, etc) and Product types (case classes) are supported by importing
spark.implicits._  Support for serializing other types will be added in
future releases.

Since DataFrame is Dataset[Row], I was expecting encoder for Row to be
there.

What's wrong with my code ?


-- 

Dragiša Krsmanović | Platform Engineer | Ticketfly

dragi...@ticketfly.com

@ticketfly  | ticketfly.com/blog |
facebook.com/ticketfly


Re: Difference between DataFrame.write.jdbc and DataFrame.write.format("jdbc")

2016-07-06 Thread Dragisa Krsmanovic
Yes, I had the save() at the end. I truncated example to highlight the
difference and forgot to put back the save()

It would be great to have the same behavior (and same code used) for both
jdbc() and format("jdbc").

Thank you.

On Wed, Jul 6, 2016 at 10:21 AM, Xiao Li <gatorsm...@gmail.com> wrote:

> Hi, Dragisa,
>
> Your second way is incomplete, right? To get the error you showed, you
> need to put save() there.
>
> Yeah, we can implement the trait CreatableRelationProvider for JDBC. Then,
> you will not see that error.
>
> Will submit a PR for that.
>
> Thanks,
>
> Xiao
>
>
> 2016-07-06 10:05 GMT-07:00 Dragisa Krsmanovic <dragi...@ticketfly.com>:
>
>> I was expecting to get the same results with both:
>>
>> dataFrame.write.mode(SaveMode.Overwrite).jdbc(dbUrl, "my_table", props)
>>
>> and
>>
>> dataFrame.write.mode(SaveMode.Overwrite).format("jdbc").options(opts).option("dbtable",
>> "my_table")
>>
>>
>> In the first example, it behaves as expected. It creates a new table and
>> populates it with the rows from DataFrame.
>>
>> In the second case, I get exception:
>> org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not
>> allow create table as select.
>>
>> Looking at the Spark source, it looks like there is a completely separate
>> implementation for format("jdbc") and for jdbc(...).
>>
>> I find that confusing. Unfortunately documentation is rather sparse and
>> one finds this discrepancy only through trial and error.
>>
>> Is there a plan to deprecate one of the forms ? Or to allow same
>> functionality for both ?
>>
>> I tried both 1.6 and 2.0-preview
>> --
>>
>> Dragiša Krsmanović | Platform Engineer | Ticketfly
>>
>> dragi...@ticketfly.com
>>
>> @ticketfly <https://twitter.com/ticketfly> | ticketfly.com/blog |
>> facebook.com/ticketfly
>>
>
>


-- 

Dragiša Krsmanović | Platform Engineer | Ticketfly

dragi...@ticketfly.com

@ticketfly <https://twitter.com/ticketfly> | ticketfly.com/blog |
facebook.com/ticketfly


Difference between DataFrame.write.jdbc and DataFrame.write.format("jdbc")

2016-07-06 Thread Dragisa Krsmanovic
I was expecting to get the same results with both:

dataFrame.write.mode(SaveMode.Overwrite).jdbc(dbUrl, "my_table", props)

and

dataFrame.write.mode(SaveMode.Overwrite).format("jdbc").options(opts).option("dbtable",
"my_table")


In the first example, it behaves as expected. It creates a new table and
populates it with the rows from DataFrame.

In the second case, I get exception:
org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not
allow create table as select.

Looking at the Spark source, it looks like there is a completely separate
implementation for format("jdbc") and for jdbc(...).

I find that confusing. Unfortunately documentation is rather sparse and one
finds this discrepancy only through trial and error.

Is there a plan to deprecate one of the forms ? Or to allow same
functionality for both ?

I tried both 1.6 and 2.0-preview
-- 

Dragiša Krsmanović | Platform Engineer | Ticketfly

dragi...@ticketfly.com

@ticketfly  | ticketfly.com/blog |
facebook.com/ticketfly