I would definitly like to open up APIs for people to write their own
encoders.  The challenge thus far has been that Encoders use internal APIs
that have not been stable for translating the data into the tungsten
format.  We also make use of the analyzer to figure out the mapping from
columns to fields (also not a stable API)  This is the only "magic" that is
happening.

If someone wants to propose a stable / fast API here it would be great to
start the discussion.  Its an often requested feature.

On Mon, Nov 14, 2016 at 1:32 PM, Sam Goodwin <sam.goodwi...@gmail.com>
wrote:

> I wouldn't recommend using a Tuple as you end up with column names like
> "_1" and "_2", but it will still work :)
>
> ExpressionEncoder can do the same thing but it doesn't support custom
> types, and as far as I can tell, does not support custom implementations.
> I.e. is it possible for me to write my own Encoder logic and completely
> bypass ExpressionEncoder? The trait definition has no useful methods so it
> doesn't seem straight-forward. If the Encoder trait was opened up so
> people could provide their own implementations then I don't see this as an
> issue anymore. It would allow for external Encoder libraries like mine
> while not neglecting Java (non-scala) developers. Is there "magic" happening
> behind the scenes stopping us from doing this?
>
> On Mon, 14 Nov 2016 at 12:31 Koert Kuipers <ko...@tresata.com> wrote:
>
>> just taking it for a quick spin it looks great, with correct support for
>> nested rows and using option for nullability.
>>
>> scala> val format = implicitly[RowFormat[(String, Seq[(String,
>> Option[Int])])]]
>> format: com.github.upio.spark.sql.RowFormat[(String, Seq[(String,
>> Option[Int])])] = com.github.upio.spark.sql.
>> FamilyFormats$$anon$3@2c0961e2
>>
>> scala> format.schema
>> res12: org.apache.spark.sql.types.StructType = 
>> StructType(StructField(_1,StringType,false),
>> StructField(_2,ArrayType(StructType(StructField(_1,StringType,false),
>> StructField(_2,IntegerType,true)),true),false))
>>
>> scala> val x = format.read(Row("a", Seq(Row("a", 5))))
>> x: (String, Seq[(String, Option[Int])]) = (a,List((a,Some(5))))
>>
>> scala> format.write(x)
>> res13: org.apache.spark.sql.Row = [a,List([a,5])]
>>
>>
>>
>> On Mon, Nov 14, 2016 at 3:10 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> agreed on your point that this can be done without macros
>>
>> On Wed, Nov 2, 2016 at 12:15 AM, Sam Goodwin <sam.goodwi...@gmail.com>
>> wrote:
>>
>> You don't need compiler time macros for this, you can do it quite easily
>> using shapeless. I've been playing with a project which borrows ideas from
>> spray-json and spray-json-shapeless to implement Row marshalling for
>> arbitrary case classes. It's checked and generated at compile time,
>> supports arbitrary/nested case classes, and allows custom types. It is also
>> entirely pluggable meaning you can bypass the default implementations and
>> provide your own, just like any type class.
>>
>> https://github.com/upio/spark-sql-formats
>>
>>
>> *From:* Michael Armbrust <mich...@databricks.com>
>> *Date:* October 26, 2016 at 12:50:23 PM PDT
>> *To:* Koert Kuipers <ko...@tresata.com>
>> *Cc:* Ryan Blue <rb...@netflix.com>, "dev@spark.apache.org" <
>> dev@spark.apache.org>
>> *Subject:* *Re: getting encoder implicits to be more accurate*
>>
>> Sorry, I realize that set is only one example here, but I don't think
>> that making the type of the implicit more narrow to include only ProductN
>> or something eliminates the issue.  Even with that change, we will fail to
>> generate an encoder with the same error if you, for example, have a field
>> of your case class that is an unsupported type.
>>
>>
>>
>> Short of changing this to compile-time macros, I think we are stuck with
>> this class of errors at runtime.  The simplest solution seems to be to
>> expand the set of thing we can handle as much as possible and allow users
>> to turn on a kryo fallback for expression encoders.  I'd be hesitant to
>> make this the default though, as behavior would change with each release
>> that adds support for more types.  I would be very supportive of making
>> this fallback a built-in option though.
>>
>>
>>
>> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>>
>> yup, it doesnt really solve the underlying issue.
>>
>> we fixed it internally by having our own typeclass that produces encoders
>> and that does check the contents of the products, but we did this by simply
>> supporting Tuple1 - Tuple22 and Option explicitly, and not supporting
>> Product, since we dont have a need for case classes
>>
>> if case classes extended ProductN (which they will i think in scala
>> 2.12?) then we could drop Product and support Product1 - Product22 and
>> Option explicitly while checking the classes they contain. that would be
>> the cleanest.
>>
>>
>>
>> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue <rb...@netflix.com> wrote:
>>
>> Isn't the problem that Option is a Product and the class it contains
>> isn't checked? Adding support for Set fixes the example, but the problem
>> would happen with any class there isn't an encoder for, right?
>>
>>
>>
>> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>> Hmm, that is unfortunate.  Maybe the best solution is to add support for
>> sets?  I don't think that would be super hard.
>>
>>
>>
>> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> i am trying to use encoders as a typeclass where if it fails to find an
>> ExpressionEncoder it falls back to KryoEncoder.
>>
>> the issue seems to be that ExpressionEncoder claims a little more than it
>> can handle here:
>>   implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] =
>> Encoders.product[T]
>>
>> this "claims" to handle for example Option[Set[Int]], but it really
>> cannot handle Set so it leads to a runtime exception.
>>
>> would it be useful to make this a little more specific? i guess the
>> challenge is going to be case classes which unfortunately dont extend
>> Product1, Product2, etc.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Ryan Blue
>>
>> Software Engineer
>>
>> Netflix
>>
>>
>>
>>
>>
>>

Reply via email to