Re: getting encoder implicits to be more accurate

2016-11-14 Thread Michael Armbrust
I would definitly like to open up APIs for people to write their own
encoders.  The challenge thus far has been that Encoders use internal APIs
that have not been stable for translating the data into the tungsten
format.  We also make use of the analyzer to figure out the mapping from
columns to fields (also not a stable API)  This is the only "magic" that is
happening.

If someone wants to propose a stable / fast API here it would be great to
start the discussion.  Its an often requested feature.

On Mon, Nov 14, 2016 at 1:32 PM, Sam Goodwin <sam.goodwi...@gmail.com>
wrote:

> I wouldn't recommend using a Tuple as you end up with column names like
> "_1" and "_2", but it will still work :)
>
> ExpressionEncoder can do the same thing but it doesn't support custom
> types, and as far as I can tell, does not support custom implementations.
> I.e. is it possible for me to write my own Encoder logic and completely
> bypass ExpressionEncoder? The trait definition has no useful methods so it
> doesn't seem straight-forward. If the Encoder trait was opened up so
> people could provide their own implementations then I don't see this as an
> issue anymore. It would allow for external Encoder libraries like mine
> while not neglecting Java (non-scala) developers. Is there "magic" happening
> behind the scenes stopping us from doing this?
>
> On Mon, 14 Nov 2016 at 12:31 Koert Kuipers <ko...@tresata.com> wrote:
>
>> just taking it for a quick spin it looks great, with correct support for
>> nested rows and using option for nullability.
>>
>> scala> val format = implicitly[RowFormat[(String, Seq[(String,
>> Option[Int])])]]
>> format: com.github.upio.spark.sql.RowFormat[(String, Seq[(String,
>> Option[Int])])] = com.github.upio.spark.sql.
>> FamilyFormats$$anon$3@2c0961e2
>>
>> scala> format.schema
>> res12: org.apache.spark.sql.types.StructType = 
>> StructType(StructField(_1,StringType,false),
>> StructField(_2,ArrayType(StructType(StructField(_1,StringType,false),
>> StructField(_2,IntegerType,true)),true),false))
>>
>> scala> val x = format.read(Row("a", Seq(Row("a", 5
>> x: (String, Seq[(String, Option[Int])]) = (a,List((a,Some(5
>>
>> scala> format.write(x)
>> res13: org.apache.spark.sql.Row = [a,List([a,5])]
>>
>>
>>
>> On Mon, Nov 14, 2016 at 3:10 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> agreed on your point that this can be done without macros
>>
>> On Wed, Nov 2, 2016 at 12:15 AM, Sam Goodwin <sam.goodwi...@gmail.com>
>> wrote:
>>
>> You don't need compiler time macros for this, you can do it quite easily
>> using shapeless. I've been playing with a project which borrows ideas from
>> spray-json and spray-json-shapeless to implement Row marshalling for
>> arbitrary case classes. It's checked and generated at compile time,
>> supports arbitrary/nested case classes, and allows custom types. It is also
>> entirely pluggable meaning you can bypass the default implementations and
>> provide your own, just like any type class.
>>
>> https://github.com/upio/spark-sql-formats
>>
>>
>> *From:* Michael Armbrust <mich...@databricks.com>
>> *Date:* October 26, 2016 at 12:50:23 PM PDT
>> *To:* Koert Kuipers <ko...@tresata.com>
>> *Cc:* Ryan Blue <rb...@netflix.com>, "dev@spark.apache.org" <
>> dev@spark.apache.org>
>> *Subject:* *Re: getting encoder implicits to be more accurate*
>>
>> Sorry, I realize that set is only one example here, but I don't think
>> that making the type of the implicit more narrow to include only ProductN
>> or something eliminates the issue.  Even with that change, we will fail to
>> generate an encoder with the same error if you, for example, have a field
>> of your case class that is an unsupported type.
>>
>>
>>
>> Short of changing this to compile-time macros, I think we are stuck with
>> this class of errors at runtime.  The simplest solution seems to be to
>> expand the set of thing we can handle as much as possible and allow users
>> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
>> make this the default though, as behavior would change with each release
>> that adds support for more types.  I would be very supportive of making
>> this fallback a built-in option though.
>>
>>
>>
>> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>>
>> yup, it doesnt really solve the underlying issue.
>>
>> we fixed it in

Re: getting encoder implicits to be more accurate

2016-11-14 Thread Sam Goodwin
I wouldn't recommend using a Tuple as you end up with column names like
"_1" and "_2", but it will still work :)

ExpressionEncoder can do the same thing but it doesn't support custom
types, and as far as I can tell, does not support custom implementations.
I.e. is it possible for me to write my own Encoder logic and completely
bypass ExpressionEncoder? The trait definition has no useful methods so it
doesn't seem straight-forward. If the Encoder trait was opened up so people
could provide their own implementations then I don't see this as an issue
anymore. It would allow for external Encoder libraries like mine while not
neglecting Java (non-scala) developers. Is there "magic" happening behind
the scenes stopping us from doing this?

On Mon, 14 Nov 2016 at 12:31 Koert Kuipers <ko...@tresata.com> wrote:

> just taking it for a quick spin it looks great, with correct support for
> nested rows and using option for nullability.
>
> scala> val format = implicitly[RowFormat[(String, Seq[(String,
> Option[Int])])]]
> format: com.github.upio.spark.sql.RowFormat[(String, Seq[(String,
> Option[Int])])] = com.github.upio.spark.sql.FamilyFormats$$anon$3@2c0961e2
>
> scala> format.schema
> res12: org.apache.spark.sql.types.StructType =
> StructType(StructField(_1,StringType,false),
> StructField(_2,ArrayType(StructType(StructField(_1,StringType,false),
> StructField(_2,IntegerType,true)),true),false))
>
> scala> val x = format.read(Row("a", Seq(Row("a", 5
> x: (String, Seq[(String, Option[Int])]) = (a,List((a,Some(5
>
> scala> format.write(x)
> res13: org.apache.spark.sql.Row = [a,List([a,5])]
>
>
>
> On Mon, Nov 14, 2016 at 3:10 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
> agreed on your point that this can be done without macros
>
> On Wed, Nov 2, 2016 at 12:15 AM, Sam Goodwin <sam.goodwi...@gmail.com>
> wrote:
>
> You don't need compiler time macros for this, you can do it quite easily
> using shapeless. I've been playing with a project which borrows ideas from
> spray-json and spray-json-shapeless to implement Row marshalling for
> arbitrary case classes. It's checked and generated at compile time,
> supports arbitrary/nested case classes, and allows custom types. It is also
> entirely pluggable meaning you can bypass the default implementations and
> provide your own, just like any type class.
>
> https://github.com/upio/spark-sql-formats
>
>
> *From:* Michael Armbrust <mich...@databricks.com>
> *Date:* October 26, 2016 at 12:50:23 PM PDT
> *To:* Koert Kuipers <ko...@tresata.com>
> *Cc:* Ryan Blue <rb...@netflix.com>, "dev@spark.apache.org" <
> dev@spark.apache.org>
> *Subject:* *Re: getting encoder implicits to be more accurate*
>
> Sorry, I realize that set is only one example here, but I don't think that
> making the type of the implicit more narrow to include only ProductN or
> something eliminates the issue.  Even with that change, we will fail to
> generate an encoder with the same error if you, for example, have a field
> of your case class that is an unsupported type.
>
>
>
> Short of changing this to compile-time macros, I think we are stuck with
> this class of errors at runtime.  The simplest solution seems to be to
> expand the set of thing we can handle as much as possible and allow users
> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
> make this the default though, as behavior would change with each release
> that adds support for more types.  I would be very supportive of making
> this fallback a built-in option though.
>
>
>
> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
> yup, it doesnt really solve the underlying issue.
>
> we fixed it internally by having our own typeclass that produces encoders
> and that does check the contents of the products, but we did this by simply
> supporting Tuple1 - Tuple22 and Option explicitly, and not supporting
> Product, since we dont have a need for case classes
>
> if case classes extended ProductN (which they will i think in scala 2.12?)
> then we could drop Product and support Product1 - Product22 and Option
> explicitly while checking the classes they contain. that would be the
> cleanest.
>
>
>
> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue <rb...@netflix.com> wrote:
>
> Isn't the problem that Option is a Product and the class it contains isn't
> checked? Adding support for Set fixes the example, but the problem would
> happen with any class there isn't an encoder for, right?
>
>
>
> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
> Hmm, that

Re: getting encoder implicits to be more accurate

2016-11-14 Thread Koert Kuipers
sorry this message by me was confusing. i was frustrated about how hard it
is to use the Encoder machinery myself directly on Row objects, this is
unrelated to the question if a shapeless based approach like sam suggest
would be better way to do encoders in general

On Mon, Nov 14, 2016 at 3:03 PM, Koert Kuipers <ko...@tresata.com> wrote:

> that makes sense. we have something like that inhouse as well, but not as
> nice and not using shapeless (we simply relied on sbt-boilerplate to handle
> all tuples and do not support case classes).
>
> however the frustrating part is that spark sql already has this more or
> less. look for example at ExpressionEncoder.fromRow and
> ExpressionEncoder.toRow. but these methods use InternalRow while the rows
> exposed to me as a user are not that.
>
> at this point i am more tempted to simply open up InternalRow at a few
> places strategically than to maintain another inhouse row marshalling
> class. once i have InternalRows looks of good stuff is available to me to
> use.
>
>
>
>
> On Wed, Nov 2, 2016 at 12:15 AM, Sam Goodwin <sam.goodwi...@gmail.com>
> wrote:
>
>> You don't need compiler time macros for this, you can do it quite easily
>> using shapeless. I've been playing with a project which borrows ideas from
>> spray-json and spray-json-shapeless to implement Row marshalling for
>> arbitrary case classes. It's checked and generated at compile time,
>> supports arbitrary/nested case classes, and allows custom types. It is also
>> entirely pluggable meaning you can bypass the default implementations and
>> provide your own, just like any type class.
>>
>> https://github.com/upio/spark-sql-formats
>>
>>
>> *From:* Michael Armbrust <mich...@databricks.com>
>> *Date:* October 26, 2016 at 12:50:23 PM PDT
>> *To:* Koert Kuipers <ko...@tresata.com>
>> *Cc:* Ryan Blue <rb...@netflix.com>, "dev@spark.apache.org" <
>> dev@spark.apache.org>
>> *Subject:* *Re: getting encoder implicits to be more accurate*
>>
>> Sorry, I realize that set is only one example here, but I don't think
>> that making the type of the implicit more narrow to include only ProductN
>> or something eliminates the issue.  Even with that change, we will fail to
>> generate an encoder with the same error if you, for example, have a field
>> of your case class that is an unsupported type.
>>
>>
>>
>> Short of changing this to compile-time macros, I think we are stuck with
>> this class of errors at runtime.  The simplest solution seems to be to
>> expand the set of thing we can handle as much as possible and allow users
>> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
>> make this the default though, as behavior would change with each release
>> that adds support for more types.  I would be very supportive of making
>> this fallback a built-in option though.
>>
>>
>>
>> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>>
>> yup, it doesnt really solve the underlying issue.
>>
>> we fixed it internally by having our own typeclass that produces encoders
>> and that does check the contents of the products, but we did this by simply
>> supporting Tuple1 - Tuple22 and Option explicitly, and not supporting
>> Product, since we dont have a need for case classes
>>
>> if case classes extended ProductN (which they will i think in scala
>> 2.12?) then we could drop Product and support Product1 - Product22 and
>> Option explicitly while checking the classes they contain. that would be
>> the cleanest.
>>
>>
>>
>> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue <rb...@netflix.com> wrote:
>>
>> Isn't the problem that Option is a Product and the class it contains
>> isn't checked? Adding support for Set fixes the example, but the problem
>> would happen with any class there isn't an encoder for, right?
>>
>>
>>
>> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>> Hmm, that is unfortunate.  Maybe the best solution is to add support for
>> sets?  I don't think that would be super hard.
>>
>>
>>
>> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> i am trying to use encoders as a typeclass where if it fails to find an
>> ExpressionEncoder it falls back to KryoEncoder.
>>
>> the issue seems to be that ExpressionEncoder claims a little more than it
>> can handle here:
>>   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] =
>> Encoders.product[T]
>>
>> this "claims" to handle for example Option[Set[Int]], but it really
>> cannot handle Set so it leads to a runtime exception.
>>
>> would it be useful to make this a little more specific? i guess the
>> challenge is going to be case classes which unfortunately dont extend
>> Product1, Product2, etc.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Ryan Blue
>>
>> Software Engineer
>>
>> Netflix
>>
>>
>>
>>
>


Re: getting encoder implicits to be more accurate

2016-11-14 Thread Koert Kuipers
just taking it for a quick spin it looks great, with correct support for
nested rows and using option for nullability.

scala> val format = implicitly[RowFormat[(String, Seq[(String,
Option[Int])])]]
format: com.github.upio.spark.sql.RowFormat[(String, Seq[(String,
Option[Int])])] = com.github.upio.spark.sql.FamilyFormats$$anon$3@2c0961e2

scala> format.schema
res12: org.apache.spark.sql.types.StructType =
StructType(StructField(_1,StringType,false),
StructField(_2,ArrayType(StructType(StructField(_1,StringType,false),
StructField(_2,IntegerType,true)),true),false))

scala> val x = format.read(Row("a", Seq(Row("a", 5
x: (String, Seq[(String, Option[Int])]) = (a,List((a,Some(5

scala> format.write(x)
res13: org.apache.spark.sql.Row = [a,List([a,5])]



On Mon, Nov 14, 2016 at 3:10 PM, Koert Kuipers <ko...@tresata.com> wrote:

> agreed on your point that this can be done without macros
>
> On Wed, Nov 2, 2016 at 12:15 AM, Sam Goodwin <sam.goodwi...@gmail.com>
> wrote:
>
>> You don't need compiler time macros for this, you can do it quite easily
>> using shapeless. I've been playing with a project which borrows ideas from
>> spray-json and spray-json-shapeless to implement Row marshalling for
>> arbitrary case classes. It's checked and generated at compile time,
>> supports arbitrary/nested case classes, and allows custom types. It is also
>> entirely pluggable meaning you can bypass the default implementations and
>> provide your own, just like any type class.
>>
>> https://github.com/upio/spark-sql-formats
>>
>>
>> *From:* Michael Armbrust <mich...@databricks.com>
>> *Date:* October 26, 2016 at 12:50:23 PM PDT
>> *To:* Koert Kuipers <ko...@tresata.com>
>> *Cc:* Ryan Blue <rb...@netflix.com>, "dev@spark.apache.org" <
>> dev@spark.apache.org>
>> *Subject:* *Re: getting encoder implicits to be more accurate*
>>
>> Sorry, I realize that set is only one example here, but I don't think
>> that making the type of the implicit more narrow to include only ProductN
>> or something eliminates the issue.  Even with that change, we will fail to
>> generate an encoder with the same error if you, for example, have a field
>> of your case class that is an unsupported type.
>>
>>
>>
>> Short of changing this to compile-time macros, I think we are stuck with
>> this class of errors at runtime.  The simplest solution seems to be to
>> expand the set of thing we can handle as much as possible and allow users
>> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
>> make this the default though, as behavior would change with each release
>> that adds support for more types.  I would be very supportive of making
>> this fallback a built-in option though.
>>
>>
>>
>> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>>
>> yup, it doesnt really solve the underlying issue.
>>
>> we fixed it internally by having our own typeclass that produces encoders
>> and that does check the contents of the products, but we did this by simply
>> supporting Tuple1 - Tuple22 and Option explicitly, and not supporting
>> Product, since we dont have a need for case classes
>>
>> if case classes extended ProductN (which they will i think in scala
>> 2.12?) then we could drop Product and support Product1 - Product22 and
>> Option explicitly while checking the classes they contain. that would be
>> the cleanest.
>>
>>
>>
>> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue <rb...@netflix.com> wrote:
>>
>> Isn't the problem that Option is a Product and the class it contains
>> isn't checked? Adding support for Set fixes the example, but the problem
>> would happen with any class there isn't an encoder for, right?
>>
>>
>>
>> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>> Hmm, that is unfortunate.  Maybe the best solution is to add support for
>> sets?  I don't think that would be super hard.
>>
>>
>>
>> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> i am trying to use encoders as a typeclass where if it fails to find an
>> ExpressionEncoder it falls back to KryoEncoder.
>>
>> the issue seems to be that ExpressionEncoder claims a little more than it
>> can handle here:
>>   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] =
>> Encoders.product[T]
>>
>> this "claims" to handle for example Option[Set[Int]], but it really
>> cannot handle Set so it leads to a runtime exception.
>>
>> would it be useful to make this a little more specific? i guess the
>> challenge is going to be case classes which unfortunately dont extend
>> Product1, Product2, etc.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Ryan Blue
>>
>> Software Engineer
>>
>> Netflix
>>
>>
>>
>>
>


Re: getting encoder implicits to be more accurate

2016-11-14 Thread Koert Kuipers
agreed on your point that this can be done without macros

On Wed, Nov 2, 2016 at 12:15 AM, Sam Goodwin <sam.goodwi...@gmail.com>
wrote:

> You don't need compiler time macros for this, you can do it quite easily
> using shapeless. I've been playing with a project which borrows ideas from
> spray-json and spray-json-shapeless to implement Row marshalling for
> arbitrary case classes. It's checked and generated at compile time,
> supports arbitrary/nested case classes, and allows custom types. It is also
> entirely pluggable meaning you can bypass the default implementations and
> provide your own, just like any type class.
>
> https://github.com/upio/spark-sql-formats
>
>
> *From:* Michael Armbrust <mich...@databricks.com>
> *Date:* October 26, 2016 at 12:50:23 PM PDT
> *To:* Koert Kuipers <ko...@tresata.com>
> *Cc:* Ryan Blue <rb...@netflix.com>, "dev@spark.apache.org" <
> dev@spark.apache.org>
> *Subject:* *Re: getting encoder implicits to be more accurate*
>
> Sorry, I realize that set is only one example here, but I don't think that
> making the type of the implicit more narrow to include only ProductN or
> something eliminates the issue.  Even with that change, we will fail to
> generate an encoder with the same error if you, for example, have a field
> of your case class that is an unsupported type.
>
>
>
> Short of changing this to compile-time macros, I think we are stuck with
> this class of errors at runtime.  The simplest solution seems to be to
> expand the set of thing we can handle as much as possible and allow users
> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
> make this the default though, as behavior would change with each release
> that adds support for more types.  I would be very supportive of making
> this fallback a built-in option though.
>
>
>
> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
> yup, it doesnt really solve the underlying issue.
>
> we fixed it internally by having our own typeclass that produces encoders
> and that does check the contents of the products, but we did this by simply
> supporting Tuple1 - Tuple22 and Option explicitly, and not supporting
> Product, since we dont have a need for case classes
>
> if case classes extended ProductN (which they will i think in scala 2.12?)
> then we could drop Product and support Product1 - Product22 and Option
> explicitly while checking the classes they contain. that would be the
> cleanest.
>
>
>
> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue <rb...@netflix.com> wrote:
>
> Isn't the problem that Option is a Product and the class it contains isn't
> checked? Adding support for Set fixes the example, but the problem would
> happen with any class there isn't an encoder for, right?
>
>
>
> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
> Hmm, that is unfortunate.  Maybe the best solution is to add support for
> sets?  I don't think that would be super hard.
>
>
>
> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
> i am trying to use encoders as a typeclass where if it fails to find an
> ExpressionEncoder it falls back to KryoEncoder.
>
> the issue seems to be that ExpressionEncoder claims a little more than it
> can handle here:
>   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] =
> Encoders.product[T]
>
> this "claims" to handle for example Option[Set[Int]], but it really cannot
> handle Set so it leads to a runtime exception.
>
> would it be useful to make this a little more specific? i guess the
> challenge is going to be case classes which unfortunately dont extend
> Product1, Product2, etc.
>
>
>
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>
>
>


Re: getting encoder implicits to be more accurate

2016-11-01 Thread Sam Goodwin
You don't need compiler time macros for this, you can do it quite easily
using shapeless. I've been playing with a project which borrows ideas from
spray-json and spray-json-shapeless to implement Row marshalling for
arbitrary case classes. It's checked and generated at compile time,
supports arbitrary/nested case classes, and allows custom types. It is also
entirely pluggable meaning you can bypass the default implementations and
provide your own, just like any type class.

https://github.com/upio/spark-sql-formats


*From:* Michael Armbrust <mich...@databricks.com>
*Date:* October 26, 2016 at 12:50:23 PM PDT
*To:* Koert Kuipers <ko...@tresata.com>
*Cc:* Ryan Blue <rb...@netflix.com>, "dev@spark.apache.org" <
dev@spark.apache.org>
*Subject:* *Re: getting encoder implicits to be more accurate*

Sorry, I realize that set is only one example here, but I don't think that
making the type of the implicit more narrow to include only ProductN or
something eliminates the issue.  Even with that change, we will fail to
generate an encoder with the same error if you, for example, have a field
of your case class that is an unsupported type.



Short of changing this to compile-time macros, I think we are stuck with
this class of errors at runtime.  The simplest solution seems to be to
expand the set of thing we can handle as much as possible and allow users
to turn on a kryo fallback for expression encoders.  I'd be hesitant to
make this the default though, as behavior would change with each release
that adds support for more types.  I would be very supportive of making
this fallback a built-in option though.



On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers <ko...@tresata.com> wrote:

yup, it doesnt really solve the underlying issue.

we fixed it internally by having our own typeclass that produces encoders
and that does check the contents of the products, but we did this by simply
supporting Tuple1 - Tuple22 and Option explicitly, and not supporting
Product, since we dont have a need for case classes

if case classes extended ProductN (which they will i think in scala 2.12?)
then we could drop Product and support Product1 - Product22 and Option
explicitly while checking the classes they contain. that would be the
cleanest.



On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue <rb...@netflix.com> wrote:

Isn't the problem that Option is a Product and the class it contains isn't
checked? Adding support for Set fixes the example, but the problem would
happen with any class there isn't an encoder for, right?



On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <mich...@databricks.com>
wrote:

Hmm, that is unfortunate.  Maybe the best solution is to add support for
sets?  I don't think that would be super hard.



On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers <ko...@tresata.com> wrote:

i am trying to use encoders as a typeclass where if it fails to find an
ExpressionEncoder it falls back to KryoEncoder.

the issue seems to be that ExpressionEncoder claims a little more than it
can handle here:
  implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] =
Encoders.product[T]

this "claims" to handle for example Option[Set[Int]], but it really cannot
handle Set so it leads to a runtime exception.

would it be useful to make this a little more specific? i guess the
challenge is going to be case classes which unfortunately dont extend
Product1, Product2, etc.







--

Ryan Blue

Software Engineer

Netflix


Re: getting encoder implicits to be more accurate

2016-10-26 Thread Michael Armbrust
Awesome, this is a great idea.  I opened SPARK-18122
.

On Wed, Oct 26, 2016 at 2:11 PM, Koert Kuipers  wrote:

> if kryo could transparently be used for subtrees without narrowing the
> implicit that would be great
>
> On Wed, Oct 26, 2016 at 5:10 PM, Koert Kuipers  wrote:
>
>> i use kryo for the whole thing currently
>>
>> it would be better to use it for the subtree
>>
>> On Wed, Oct 26, 2016 at 5:06 PM, Michael Armbrust > > wrote:
>>
>>> You use kryo encoder for the whole thing?  Or just the subtree that we
>>> don't have specific encoders for?
>>>
>>> Also, I'm saying I like the idea of having a kryo fallback.  I don't see
>>> the point of narrowing the the definition of the implicit.
>>>
>>> On Wed, Oct 26, 2016 at 1:07 PM, Koert Kuipers 
>>> wrote:
>>>
 for example (the log shows when it creates a kryo encoder):

 scala> implicitly[EncoderEvidence[Option[Seq[String.encoder
 res5: org.apache.spark.sql.Encoder[Option[Seq[String]]] =
 class[value[0]: array]

 scala> implicitly[EncoderEvidence[Option[Set[String.encoder
 dataframe.EncoderEvidence$: using kryo encoder for
 scala.Option[Set[String]]
 res6: org.apache.spark.sql.Encoder[Option[Set[String]]] =
 class[value[0]: binary]




 On Wed, Oct 26, 2016 at 4:00 PM, Koert Kuipers 
 wrote:

> why would generating implicits for ProductN where you also require the
> elements in the Product to have an expression encoder not work?
>
> we do this. and then we have a generic fallback where it produces a
> kryo encoder.
>
> for us the result is that say an implicit for Seq[(Int, Seq[(String,
> Int)])] will create a new ExpressionEncoder(), while an implicit for
> Seq[(Int, Set[(String, Int)])] produces a Encoders.kryoEncoder()
>
> On Wed, Oct 26, 2016 at 3:50 PM, Michael Armbrust <
> mich...@databricks.com> wrote:
>
>> Sorry, I realize that set is only one example here, but I don't think
>> that making the type of the implicit more narrow to include only ProductN
>> or something eliminates the issue.  Even with that change, we will fail 
>> to
>> generate an encoder with the same error if you, for example, have a field
>> of your case class that is an unsupported type.
>>
>> Short of changing this to compile-time macros, I think we are stuck
>> with this class of errors at runtime.  The simplest solution seems to be 
>> to
>> expand the set of thing we can handle as much as possible and allow users
>> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
>> make this the default though, as behavior would change with each release
>> that adds support for more types.  I would be very supportive of making
>> this fallback a built-in option though.
>>
>> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers 
>> wrote:
>>
>>> yup, it doesnt really solve the underlying issue.
>>>
>>> we fixed it internally by having our own typeclass that produces
>>> encoders and that does check the contents of the products, but we did 
>>> this
>>> by simply supporting Tuple1 - Tuple22 and Option explicitly, and not
>>> supporting Product, since we dont have a need for case classes
>>>
>>> if case classes extended ProductN (which they will i think in scala
>>> 2.12?) then we could drop Product and support Product1 - Product22 and
>>> Option explicitly while checking the classes they contain. that would be
>>> the cleanest.
>>>
>>>
>>> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue 
>>> wrote:
>>>
 Isn't the problem that Option is a Product and the class it
 contains isn't checked? Adding support for Set fixes the example, but 
 the
 problem would happen with any class there isn't an encoder for, right?

 On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
 mich...@databricks.com> wrote:

> Hmm, that is unfortunate.  Maybe the best solution is to add
> support for sets?  I don't think that would be super hard.
>
> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers 
> wrote:
>
>> i am trying to use encoders as a typeclass where if it fails to
>> find an ExpressionEncoder it falls back to KryoEncoder.
>>
>> the issue seems to be that ExpressionEncoder claims a little more
>> than it can handle here:
>>   implicit def newProductEncoder[T <: Product : TypeTag]:
>> Encoder[T] = Encoders.product[T]
>>
>> this "claims" to handle for example Option[Set[Int]], but it
>> really cannot handle Set so it leads to a runtime 

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Koert Kuipers
if kryo could transparently be used for subtrees without narrowing the
implicit that would be great

On Wed, Oct 26, 2016 at 5:10 PM, Koert Kuipers  wrote:

> i use kryo for the whole thing currently
>
> it would be better to use it for the subtree
>
> On Wed, Oct 26, 2016 at 5:06 PM, Michael Armbrust 
> wrote:
>
>> You use kryo encoder for the whole thing?  Or just the subtree that we
>> don't have specific encoders for?
>>
>> Also, I'm saying I like the idea of having a kryo fallback.  I don't see
>> the point of narrowing the the definition of the implicit.
>>
>> On Wed, Oct 26, 2016 at 1:07 PM, Koert Kuipers  wrote:
>>
>>> for example (the log shows when it creates a kryo encoder):
>>>
>>> scala> implicitly[EncoderEvidence[Option[Seq[String.encoder
>>> res5: org.apache.spark.sql.Encoder[Option[Seq[String]]] =
>>> class[value[0]: array]
>>>
>>> scala> implicitly[EncoderEvidence[Option[Set[String.encoder
>>> dataframe.EncoderEvidence$: using kryo encoder for
>>> scala.Option[Set[String]]
>>> res6: org.apache.spark.sql.Encoder[Option[Set[String]]] =
>>> class[value[0]: binary]
>>>
>>>
>>>
>>>
>>> On Wed, Oct 26, 2016 at 4:00 PM, Koert Kuipers 
>>> wrote:
>>>
 why would generating implicits for ProductN where you also require the
 elements in the Product to have an expression encoder not work?

 we do this. and then we have a generic fallback where it produces a
 kryo encoder.

 for us the result is that say an implicit for Seq[(Int, Seq[(String,
 Int)])] will create a new ExpressionEncoder(), while an implicit for
 Seq[(Int, Set[(String, Int)])] produces a Encoders.kryoEncoder()

 On Wed, Oct 26, 2016 at 3:50 PM, Michael Armbrust <
 mich...@databricks.com> wrote:

> Sorry, I realize that set is only one example here, but I don't think
> that making the type of the implicit more narrow to include only ProductN
> or something eliminates the issue.  Even with that change, we will fail to
> generate an encoder with the same error if you, for example, have a field
> of your case class that is an unsupported type.
>
> Short of changing this to compile-time macros, I think we are stuck
> with this class of errors at runtime.  The simplest solution seems to be 
> to
> expand the set of thing we can handle as much as possible and allow users
> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
> make this the default though, as behavior would change with each release
> that adds support for more types.  I would be very supportive of making
> this fallback a built-in option though.
>
> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers 
> wrote:
>
>> yup, it doesnt really solve the underlying issue.
>>
>> we fixed it internally by having our own typeclass that produces
>> encoders and that does check the contents of the products, but we did 
>> this
>> by simply supporting Tuple1 - Tuple22 and Option explicitly, and not
>> supporting Product, since we dont have a need for case classes
>>
>> if case classes extended ProductN (which they will i think in scala
>> 2.12?) then we could drop Product and support Product1 - Product22 and
>> Option explicitly while checking the classes they contain. that would be
>> the cleanest.
>>
>>
>> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue  wrote:
>>
>>> Isn't the problem that Option is a Product and the class it contains
>>> isn't checked? Adding support for Set fixes the example, but the problem
>>> would happen with any class there isn't an encoder for, right?
>>>
>>> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 Hmm, that is unfortunate.  Maybe the best solution is to add
 support for sets?  I don't think that would be super hard.

 On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers 
 wrote:

> i am trying to use encoders as a typeclass where if it fails to
> find an ExpressionEncoder it falls back to KryoEncoder.
>
> the issue seems to be that ExpressionEncoder claims a little more
> than it can handle here:
>   implicit def newProductEncoder[T <: Product : TypeTag]:
> Encoder[T] = Encoders.product[T]
>
> this "claims" to handle for example Option[Set[Int]], but it
> really cannot handle Set so it leads to a runtime exception.
>
> would it be useful to make this a little more specific? i guess
> the challenge is going to be case classes which unfortunately dont 
> extend
> Product1, Product2, etc.
>


>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Koert Kuipers
i use kryo for the whole thing currently

it would be better to use it for the subtree

On Wed, Oct 26, 2016 at 5:06 PM, Michael Armbrust 
wrote:

> You use kryo encoder for the whole thing?  Or just the subtree that we
> don't have specific encoders for?
>
> Also, I'm saying I like the idea of having a kryo fallback.  I don't see
> the point of narrowing the the definition of the implicit.
>
> On Wed, Oct 26, 2016 at 1:07 PM, Koert Kuipers  wrote:
>
>> for example (the log shows when it creates a kryo encoder):
>>
>> scala> implicitly[EncoderEvidence[Option[Seq[String.encoder
>> res5: org.apache.spark.sql.Encoder[Option[Seq[String]]] =
>> class[value[0]: array]
>>
>> scala> implicitly[EncoderEvidence[Option[Set[String.encoder
>> dataframe.EncoderEvidence$: using kryo encoder for
>> scala.Option[Set[String]]
>> res6: org.apache.spark.sql.Encoder[Option[Set[String]]] =
>> class[value[0]: binary]
>>
>>
>>
>>
>> On Wed, Oct 26, 2016 at 4:00 PM, Koert Kuipers  wrote:
>>
>>> why would generating implicits for ProductN where you also require the
>>> elements in the Product to have an expression encoder not work?
>>>
>>> we do this. and then we have a generic fallback where it produces a kryo
>>> encoder.
>>>
>>> for us the result is that say an implicit for Seq[(Int, Seq[(String,
>>> Int)])] will create a new ExpressionEncoder(), while an implicit for
>>> Seq[(Int, Set[(String, Int)])] produces a Encoders.kryoEncoder()
>>>
>>> On Wed, Oct 26, 2016 at 3:50 PM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 Sorry, I realize that set is only one example here, but I don't think
 that making the type of the implicit more narrow to include only ProductN
 or something eliminates the issue.  Even with that change, we will fail to
 generate an encoder with the same error if you, for example, have a field
 of your case class that is an unsupported type.

 Short of changing this to compile-time macros, I think we are stuck
 with this class of errors at runtime.  The simplest solution seems to be to
 expand the set of thing we can handle as much as possible and allow users
 to turn on a kryo fallback for expression encoders.  I'd be hesitant to
 make this the default though, as behavior would change with each release
 that adds support for more types.  I would be very supportive of making
 this fallback a built-in option though.

 On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers 
 wrote:

> yup, it doesnt really solve the underlying issue.
>
> we fixed it internally by having our own typeclass that produces
> encoders and that does check the contents of the products, but we did this
> by simply supporting Tuple1 - Tuple22 and Option explicitly, and not
> supporting Product, since we dont have a need for case classes
>
> if case classes extended ProductN (which they will i think in scala
> 2.12?) then we could drop Product and support Product1 - Product22 and
> Option explicitly while checking the classes they contain. that would be
> the cleanest.
>
>
> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue  wrote:
>
>> Isn't the problem that Option is a Product and the class it contains
>> isn't checked? Adding support for Set fixes the example, but the problem
>> would happen with any class there isn't an encoder for, right?
>>
>> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>>> Hmm, that is unfortunate.  Maybe the best solution is to add support
>>> for sets?  I don't think that would be super hard.
>>>
>>> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers 
>>> wrote:
>>>
 i am trying to use encoders as a typeclass where if it fails to
 find an ExpressionEncoder it falls back to KryoEncoder.

 the issue seems to be that ExpressionEncoder claims a little more
 than it can handle here:
   implicit def newProductEncoder[T <: Product : TypeTag]:
 Encoder[T] = Encoders.product[T]

 this "claims" to handle for example Option[Set[Int]], but it really
 cannot handle Set so it leads to a runtime exception.

 would it be useful to make this a little more specific? i guess the
 challenge is going to be case classes which unfortunately dont extend
 Product1, Product2, etc.

>>>
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>

>>>
>>
>


Re: getting encoder implicits to be more accurate

2016-10-26 Thread Michael Armbrust
You use kryo encoder for the whole thing?  Or just the subtree that we
don't have specific encoders for?

Also, I'm saying I like the idea of having a kryo fallback.  I don't see
the point of narrowing the the definition of the implicit.

On Wed, Oct 26, 2016 at 1:07 PM, Koert Kuipers  wrote:

> for example (the log shows when it creates a kryo encoder):
>
> scala> implicitly[EncoderEvidence[Option[Seq[String.encoder
> res5: org.apache.spark.sql.Encoder[Option[Seq[String]]] = class[value[0]:
> array]
>
> scala> implicitly[EncoderEvidence[Option[Set[String.encoder
> dataframe.EncoderEvidence$: using kryo encoder for
> scala.Option[Set[String]]
> res6: org.apache.spark.sql.Encoder[Option[Set[String]]] = class[value[0]:
> binary]
>
>
>
>
> On Wed, Oct 26, 2016 at 4:00 PM, Koert Kuipers  wrote:
>
>> why would generating implicits for ProductN where you also require the
>> elements in the Product to have an expression encoder not work?
>>
>> we do this. and then we have a generic fallback where it produces a kryo
>> encoder.
>>
>> for us the result is that say an implicit for Seq[(Int, Seq[(String,
>> Int)])] will create a new ExpressionEncoder(), while an implicit for
>> Seq[(Int, Set[(String, Int)])] produces a Encoders.kryoEncoder()
>>
>> On Wed, Oct 26, 2016 at 3:50 PM, Michael Armbrust > > wrote:
>>
>>> Sorry, I realize that set is only one example here, but I don't think
>>> that making the type of the implicit more narrow to include only ProductN
>>> or something eliminates the issue.  Even with that change, we will fail to
>>> generate an encoder with the same error if you, for example, have a field
>>> of your case class that is an unsupported type.
>>>
>>> Short of changing this to compile-time macros, I think we are stuck with
>>> this class of errors at runtime.  The simplest solution seems to be to
>>> expand the set of thing we can handle as much as possible and allow users
>>> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
>>> make this the default though, as behavior would change with each release
>>> that adds support for more types.  I would be very supportive of making
>>> this fallback a built-in option though.
>>>
>>> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers 
>>> wrote:
>>>
 yup, it doesnt really solve the underlying issue.

 we fixed it internally by having our own typeclass that produces
 encoders and that does check the contents of the products, but we did this
 by simply supporting Tuple1 - Tuple22 and Option explicitly, and not
 supporting Product, since we dont have a need for case classes

 if case classes extended ProductN (which they will i think in scala
 2.12?) then we could drop Product and support Product1 - Product22 and
 Option explicitly while checking the classes they contain. that would be
 the cleanest.


 On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue  wrote:

> Isn't the problem that Option is a Product and the class it contains
> isn't checked? Adding support for Set fixes the example, but the problem
> would happen with any class there isn't an encoder for, right?
>
> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
> mich...@databricks.com> wrote:
>
>> Hmm, that is unfortunate.  Maybe the best solution is to add support
>> for sets?  I don't think that would be super hard.
>>
>> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers 
>> wrote:
>>
>>> i am trying to use encoders as a typeclass where if it fails to find
>>> an ExpressionEncoder it falls back to KryoEncoder.
>>>
>>> the issue seems to be that ExpressionEncoder claims a little more
>>> than it can handle here:
>>>   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T]
>>> = Encoders.product[T]
>>>
>>> this "claims" to handle for example Option[Set[Int]], but it really
>>> cannot handle Set so it leads to a runtime exception.
>>>
>>> would it be useful to make this a little more specific? i guess the
>>> challenge is going to be case classes which unfortunately dont extend
>>> Product1, Product2, etc.
>>>
>>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


>>>
>>
>


Re: getting encoder implicits to be more accurate

2016-10-26 Thread Koert Kuipers
for example (the log shows when it creates a kryo encoder):

scala> implicitly[EncoderEvidence[Option[Seq[String.encoder
res5: org.apache.spark.sql.Encoder[Option[Seq[String]]] = class[value[0]:
array]

scala> implicitly[EncoderEvidence[Option[Set[String.encoder
dataframe.EncoderEvidence$: using kryo encoder for scala.Option[Set[String]]
res6: org.apache.spark.sql.Encoder[Option[Set[String]]] = class[value[0]:
binary]




On Wed, Oct 26, 2016 at 4:00 PM, Koert Kuipers  wrote:

> why would generating implicits for ProductN where you also require the
> elements in the Product to have an expression encoder not work?
>
> we do this. and then we have a generic fallback where it produces a kryo
> encoder.
>
> for us the result is that say an implicit for Seq[(Int, Seq[(String,
> Int)])] will create a new ExpressionEncoder(), while an implicit for
> Seq[(Int, Set[(String, Int)])] produces a Encoders.kryoEncoder()
>
> On Wed, Oct 26, 2016 at 3:50 PM, Michael Armbrust 
> wrote:
>
>> Sorry, I realize that set is only one example here, but I don't think
>> that making the type of the implicit more narrow to include only ProductN
>> or something eliminates the issue.  Even with that change, we will fail to
>> generate an encoder with the same error if you, for example, have a field
>> of your case class that is an unsupported type.
>>
>> Short of changing this to compile-time macros, I think we are stuck with
>> this class of errors at runtime.  The simplest solution seems to be to
>> expand the set of thing we can handle as much as possible and allow users
>> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
>> make this the default though, as behavior would change with each release
>> that adds support for more types.  I would be very supportive of making
>> this fallback a built-in option though.
>>
>> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers 
>> wrote:
>>
>>> yup, it doesnt really solve the underlying issue.
>>>
>>> we fixed it internally by having our own typeclass that produces
>>> encoders and that does check the contents of the products, but we did this
>>> by simply supporting Tuple1 - Tuple22 and Option explicitly, and not
>>> supporting Product, since we dont have a need for case classes
>>>
>>> if case classes extended ProductN (which they will i think in scala
>>> 2.12?) then we could drop Product and support Product1 - Product22 and
>>> Option explicitly while checking the classes they contain. that would be
>>> the cleanest.
>>>
>>>
>>> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue  wrote:
>>>
 Isn't the problem that Option is a Product and the class it contains
 isn't checked? Adding support for Set fixes the example, but the problem
 would happen with any class there isn't an encoder for, right?

 On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
 mich...@databricks.com> wrote:

> Hmm, that is unfortunate.  Maybe the best solution is to add support
> for sets?  I don't think that would be super hard.
>
> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers 
> wrote:
>
>> i am trying to use encoders as a typeclass where if it fails to find
>> an ExpressionEncoder it falls back to KryoEncoder.
>>
>> the issue seems to be that ExpressionEncoder claims a little more
>> than it can handle here:
>>   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T]
>> = Encoders.product[T]
>>
>> this "claims" to handle for example Option[Set[Int]], but it really
>> cannot handle Set so it leads to a runtime exception.
>>
>> would it be useful to make this a little more specific? i guess the
>> challenge is going to be case classes which unfortunately dont extend
>> Product1, Product2, etc.
>>
>
>


 --
 Ryan Blue
 Software Engineer
 Netflix

>>>
>>>
>>
>


Re: getting encoder implicits to be more accurate

2016-10-26 Thread Koert Kuipers
why would generating implicits for ProductN where you also require the
elements in the Product to have an expression encoder not work?

we do this. and then we have a generic fallback where it produces a kryo
encoder.

for us the result is that say an implicit for Seq[(Int, Seq[(String,
Int)])] will create a new ExpressionEncoder(), while an implicit for
Seq[(Int, Set[(String, Int)])] produces a Encoders.kryoEncoder()

On Wed, Oct 26, 2016 at 3:50 PM, Michael Armbrust 
wrote:

> Sorry, I realize that set is only one example here, but I don't think that
> making the type of the implicit more narrow to include only ProductN or
> something eliminates the issue.  Even with that change, we will fail to
> generate an encoder with the same error if you, for example, have a field
> of your case class that is an unsupported type.
>
> Short of changing this to compile-time macros, I think we are stuck with
> this class of errors at runtime.  The simplest solution seems to be to
> expand the set of thing we can handle as much as possible and allow users
> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
> make this the default though, as behavior would change with each release
> that adds support for more types.  I would be very supportive of making
> this fallback a built-in option though.
>
> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers  wrote:
>
>> yup, it doesnt really solve the underlying issue.
>>
>> we fixed it internally by having our own typeclass that produces encoders
>> and that does check the contents of the products, but we did this by simply
>> supporting Tuple1 - Tuple22 and Option explicitly, and not supporting
>> Product, since we dont have a need for case classes
>>
>> if case classes extended ProductN (which they will i think in scala
>> 2.12?) then we could drop Product and support Product1 - Product22 and
>> Option explicitly while checking the classes they contain. that would be
>> the cleanest.
>>
>>
>> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue  wrote:
>>
>>> Isn't the problem that Option is a Product and the class it contains
>>> isn't checked? Adding support for Set fixes the example, but the problem
>>> would happen with any class there isn't an encoder for, right?
>>>
>>> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 Hmm, that is unfortunate.  Maybe the best solution is to add support
 for sets?  I don't think that would be super hard.

 On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers 
 wrote:

> i am trying to use encoders as a typeclass where if it fails to find
> an ExpressionEncoder it falls back to KryoEncoder.
>
> the issue seems to be that ExpressionEncoder claims a little more than
> it can handle here:
>   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] =
> Encoders.product[T]
>
> this "claims" to handle for example Option[Set[Int]], but it really
> cannot handle Set so it leads to a runtime exception.
>
> would it be useful to make this a little more specific? i guess the
> challenge is going to be case classes which unfortunately dont extend
> Product1, Product2, etc.
>


>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>


Re: getting encoder implicits to be more accurate

2016-10-26 Thread Michael Armbrust
Sorry, I realize that set is only one example here, but I don't think that
making the type of the implicit more narrow to include only ProductN or
something eliminates the issue.  Even with that change, we will fail to
generate an encoder with the same error if you, for example, have a field
of your case class that is an unsupported type.

Short of changing this to compile-time macros, I think we are stuck with
this class of errors at runtime.  The simplest solution seems to be to
expand the set of thing we can handle as much as possible and allow users
to turn on a kryo fallback for expression encoders.  I'd be hesitant to
make this the default though, as behavior would change with each release
that adds support for more types.  I would be very supportive of making
this fallback a built-in option though.

On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers  wrote:

> yup, it doesnt really solve the underlying issue.
>
> we fixed it internally by having our own typeclass that produces encoders
> and that does check the contents of the products, but we did this by simply
> supporting Tuple1 - Tuple22 and Option explicitly, and not supporting
> Product, since we dont have a need for case classes
>
> if case classes extended ProductN (which they will i think in scala 2.12?)
> then we could drop Product and support Product1 - Product22 and Option
> explicitly while checking the classes they contain. that would be the
> cleanest.
>
>
> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue  wrote:
>
>> Isn't the problem that Option is a Product and the class it contains
>> isn't checked? Adding support for Set fixes the example, but the problem
>> would happen with any class there isn't an encoder for, right?
>>
>> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>>> Hmm, that is unfortunate.  Maybe the best solution is to add support for
>>> sets?  I don't think that would be super hard.
>>>
>>> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers 
>>> wrote:
>>>
 i am trying to use encoders as a typeclass where if it fails to find an
 ExpressionEncoder it falls back to KryoEncoder.

 the issue seems to be that ExpressionEncoder claims a little more than
 it can handle here:
   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] =
 Encoders.product[T]

 this "claims" to handle for example Option[Set[Int]], but it really
 cannot handle Set so it leads to a runtime exception.

 would it be useful to make this a little more specific? i guess the
 challenge is going to be case classes which unfortunately dont extend
 Product1, Product2, etc.

>>>
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>


Re: getting encoder implicits to be more accurate

2016-10-26 Thread Koert Kuipers
yup, it doesnt really solve the underlying issue.

we fixed it internally by having our own typeclass that produces encoders
and that does check the contents of the products, but we did this by simply
supporting Tuple1 - Tuple22 and Option explicitly, and not supporting
Product, since we dont have a need for case classes

if case classes extended ProductN (which they will i think in scala 2.12?)
then we could drop Product and support Product1 - Product22 and Option
explicitly while checking the classes they contain. that would be the
cleanest.


On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue  wrote:

> Isn't the problem that Option is a Product and the class it contains isn't
> checked? Adding support for Set fixes the example, but the problem would
> happen with any class there isn't an encoder for, right?
>
> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust  > wrote:
>
>> Hmm, that is unfortunate.  Maybe the best solution is to add support for
>> sets?  I don't think that would be super hard.
>>
>> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers  wrote:
>>
>>> i am trying to use encoders as a typeclass where if it fails to find an
>>> ExpressionEncoder it falls back to KryoEncoder.
>>>
>>> the issue seems to be that ExpressionEncoder claims a little more than
>>> it can handle here:
>>>   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] =
>>> Encoders.product[T]
>>>
>>> this "claims" to handle for example Option[Set[Int]], but it really
>>> cannot handle Set so it leads to a runtime exception.
>>>
>>> would it be useful to make this a little more specific? i guess the
>>> challenge is going to be case classes which unfortunately dont extend
>>> Product1, Product2, etc.
>>>
>>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: getting encoder implicits to be more accurate

2016-10-26 Thread Ryan Blue
Isn't the problem that Option is a Product and the class it contains isn't
checked? Adding support for Set fixes the example, but the problem would
happen with any class there isn't an encoder for, right?

On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust 
wrote:

> Hmm, that is unfortunate.  Maybe the best solution is to add support for
> sets?  I don't think that would be super hard.
>
> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers  wrote:
>
>> i am trying to use encoders as a typeclass where if it fails to find an
>> ExpressionEncoder it falls back to KryoEncoder.
>>
>> the issue seems to be that ExpressionEncoder claims a little more than it
>> can handle here:
>>   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] =
>> Encoders.product[T]
>>
>> this "claims" to handle for example Option[Set[Int]], but it really
>> cannot handle Set so it leads to a runtime exception.
>>
>> would it be useful to make this a little more specific? i guess the
>> challenge is going to be case classes which unfortunately dont extend
>> Product1, Product2, etc.
>>
>
>


-- 
Ryan Blue
Software Engineer
Netflix


Re: getting encoder implicits to be more accurate

2016-10-26 Thread Michael Armbrust
Hmm, that is unfortunate.  Maybe the best solution is to add support for
sets?  I don't think that would be super hard.

On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers  wrote:

> i am trying to use encoders as a typeclass where if it fails to find an
> ExpressionEncoder it falls back to KryoEncoder.
>
> the issue seems to be that ExpressionEncoder claims a little more than it
> can handle here:
>   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] =
> Encoders.product[T]
>
> this "claims" to handle for example Option[Set[Int]], but it really cannot
> handle Set so it leads to a runtime exception.
>
> would it be useful to make this a little more specific? i guess the
> challenge is going to be case classes which unfortunately dont extend
> Product1, Product2, etc.
>