[jira] [Commented] (SPARK-22351) Support user-created custom Encoders for Datasets

2017-12-12 Thread Adamos Loizou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287735#comment-16287735
 ] 

Adamos Loizou commented on SPARK-22351:
---

Hello guys, once more I've run against this problem now with ADT/Sealed 
hierarchies examples.
For reference, there are already people facing this issue ([stack overflow 
link|https://stackoverflow.com/questions/41030073/encode-an-adt-sealed-trait-hierarchy-into-spark-dataset-column]).
Here is an example:

{code:java}
sealed trait Fruit
case object Apple extends Fruit
case object Orange extends Fruit

case class Bag(quantity: Int, fruit: Fruit)

Seq(Bag(1, Apple), Bag(3, Orange)).toDS // <- This fails because it can't find 
an encoder for Fruit
{code}

Ideally I'd like to be able to create my encoder where I can tell it, for 
example, to use the case object toString method for mapping it to a String 
column.

How feasible would it be to expose an API for creating custom encoders?
Unfortunately, not having this limits the capacity for generalised and typesafe 
models quite a bit.

Thank you.

> Support user-created custom Encoders for Datasets
> -
>
> Key: SPARK-22351
> URL: https://issues.apache.org/jira/browse/SPARK-22351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adamos Loizou
>Priority: Minor
>
> It would be very helpful if we could easily support creating custom encoders 
> for classes in Spark SQL.
> This is to allow a user to properly define a business model using types of 
> their choice. They can then map them to Spark SQL types without being forced 
> to pollute their model with the built-in mappable types (e.g. 
> {{java.sql.Timestamp}}).
> Specifically in our case, we tend to use either the Java 8 time API or the 
> joda time API for dates instead of {{java.sql.Timestamp}} whose API is quite 
> limited compared to the others.
> Ideally we would like to be able to have a dataset of such a class:
> {code:java}
> case class Person(name: String, dateOfBirth: org.joda.time.LocalDate)
> implicit def localDateTimeEncoder: Encoder[LocalDate] = ??? // we define 
> something that maps to Spark SQL TimestampType
> ...
> // read csv and map it to model
> val people:Dataset[Person] = spark.read.csv("/my/path/file.csv").as[Person]
> {code}
> While this was possible in Spark 1.6 it's not longer the case in Spark 2.x.
> It's also not straight forward as to how to support that using an 
> {{ExpressionEncoder}} (any tips would be much appreciated)
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22351) Support user-created custom Encoders for Datasets

2017-10-30 Thread Adamos Loizou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224715#comment-16224715
 ] 

Adamos Loizou commented on SPARK-22351:
---

Hi [~hyukjin.kwon], in Spark 1.6 I managed to add support for custom types by 
defining subclasses of {{org.apache.spark.sql.types.UserDefinedType}}.

e.g.

{code:java}
class JodaLocalDateType extends UserDefinedType[org.joda.time.LocalDate] {
  override def sqlType: DataType = TimestampType
  override def serialize(p: org.joda.time.LocalDate) = ???
  override def deserialize(datum: Any): org.joda.time.LocalDate = ??? 
  ...
}
{code}


This abstract class has been made private in Spark 2.x.
Unfortunately, there doesn't seem to be an easy alternative.

> Support user-created custom Encoders for Datasets
> -
>
> Key: SPARK-22351
> URL: https://issues.apache.org/jira/browse/SPARK-22351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adamos Loizou
>Priority: Minor
>
> It would be very helpful if we could easily support creating custom encoders 
> for classes in Spark SQL.
> This is to allow a user to properly define a business model using types of 
> their choice. They can then map them to Spark SQL types without being forced 
> to pollute their model with the built-in mappable types (e.g. 
> {{java.sql.Timestamp}}).
> Specifically in our case, we tend to use either the Java 8 time API or the 
> joda time API for dates instead of {{java.sql.Timestamp}} whose API is quite 
> limited compared to the others.
> Ideally we would like to be able to have a dataset of such a class:
> {code:java}
> case class Person(name: String, dateOfBirth: org.joda.time.LocalDate)
> implicit def localDateTimeEncoder: Encoder[LocalDate] = ??? // we define 
> something that maps to Spark SQL TimestampType
> ...
> // read csv and map it to model
> val people:Dataset[Person] = spark.read.csv("/my/path/file.csv").as[Person]
> {code}
> While this was possible in Spark 1.6 it's not longer the case in Spark 2.x.
> It's also not straight forward as to how to support that using an 
> {{ExpressionEncoder}} (any tips would be much appreciated)
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22351) Support user-created custom Encoders for Datasets

2017-10-28 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223470#comment-16223470
 ] 

Hyukjin Kwon commented on SPARK-22351:
--

{quote}
While this was possible in Spark 1.6 it's not longer the case in Spark 2.x.
{quote}

Would you mind sharing the codes? I think I can't reproduce this in 1.6.



> Support user-created custom Encoders for Datasets
> -
>
> Key: SPARK-22351
> URL: https://issues.apache.org/jira/browse/SPARK-22351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adamos Loizou
>Priority: Minor
>
> It would be very helpful if we could easily support creating custom encoders 
> for classes in Spark SQL.
> This is to allow a user to properly define a business model using types of 
> their choice. They can then map them to Spark SQL types without being forced 
> to pollute their model with the built-in mappable types (e.g. 
> {{java.sql.Timestamp}}).
> Specifically in our case, we tend to use either the Java 8 time API or the 
> joda time API for dates instead of {{java.sql.Timestamp}} whose API is quite 
> limited compared to the others.
> Ideally we would like to be able to have a dataset of such a class:
> {code:java}
> case class Person(name: String, dateOfBirth: org.joda.time.LocalDate)
> implicit def localDateTimeEncoder: Encoder[LocalDate] = ??? // we define 
> something that maps to Spark SQL TimestampType
> ...
> // read csv and map it to model
> val people:Dataset[Person] = spark.read.csv("/my/path/file.csv").as[Person]
> {code}
> While this was possible in Spark 1.6 it's not longer the case in Spark 2.x.
> It's also not straight forward as to how to support that using an 
> {{ExpressionEncoder}} (any tips would be much appreciated)
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22351) Support user-created custom Encoders for Datasets

2017-10-25 Thread Adamos Loizou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218352#comment-16218352
 ] 

Adamos Loizou commented on SPARK-22351:
---

I have also looked at the {{Encoders}} API. The methods available do not allow 
to create an instance of an encoder of a custom type such as 
{{Encoder\[org.joda.time.LocalDate\]}} that maps to a specific Spark SQL type 
of my choosing ({{TimestampType}}). The only option from {{Encoders}} is the 
kryo one which will not map it to {{TimestampType}}.

> Support user-created custom Encoders for Datasets
> -
>
> Key: SPARK-22351
> URL: https://issues.apache.org/jira/browse/SPARK-22351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adamos Loizou
>
> It would be very helpful if we could easily support creating custom encoders 
> for classes in Spark SQL.
> This is to allow a user to properly define a business model using types of 
> their choice. They can then map them to Spark SQL types without being forced 
> to pollute their model with the built-in mappable types (e.g. 
> {{java.sql.Timestamp}}).
> Specifically in our case, we tend to use either the Java 8 time API or the 
> joda time API for dates instead of {{java.sql.Timestamp}} whose API is quite 
> limited compared to the others.
> Ideally we would like to be able to have a dataset of such a class:
> {code:java}
> case class Person(name: String, dateOfBirth: org.joda.time.LocalDate)
> implicit def localDateTimeEncoder: Encoder[LocalDate] = ??? // we define 
> something that maps to Spark SQL TimestampType
> ...
> // read csv and map it to model
> val people:Dataset[Person] = spark.read.csv("/my/path/file.csv").as[Person]
> {code}
> While this was possible in Spark 1.6 it's not longer the case in Spark 2.x.
> It's also not straight forward as to how to support that using an 
> {{ExpressionEncoder}} (any tips would be much appreciated)
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22351) Support user-created custom Encoders for Datasets

2017-10-25 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218326#comment-16218326
 ] 

Sean Owen commented on SPARK-22351:
---

You are looking for {{org.apache.spark.sql.Encoder}} right? it's a trait.

> Support user-created custom Encoders for Datasets
> -
>
> Key: SPARK-22351
> URL: https://issues.apache.org/jira/browse/SPARK-22351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adamos Loizou
>
> It would be very helpful if we could easily support creating custom encoders 
> for classes in Spark SQL.
> This is to allow a user to properly define a business model using types of 
> their choice. They can then map them to Spark SQL types without being forced 
> to pollute their model with the built-in mappable types (e.g. 
> {{java.sql.Timestamp}}).
> Specifically in our case, we tend to use either the Java 8 time API or the 
> joda time API for dates instead of {{java.sql.Timestamp}} whose API is quite 
> limited compared to the others.
> Ideally we would like to be able to have a dataset of such a class:
> {code:java}
> case class Person(name: String, dateOfBirth: org.joda.time.LocalDate)
> implicit def localDateTimeEncoder: Encoder[LocalDate] = ??? // we define 
> something that maps to Spark SQL TimestampType
> ...
> // read csv and map it to model
> val people:Dataset[Person] = spark.read.csv("/my/path/file.csv").as[Person]
> {code}
> While this was possible in Spark 1.6 it's not longer the case in Spark 2.x.
> It's also not straight forward as to how to support that using an 
> {{ExpressionEncoder}} (any tips would be much appreciated)
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22351) Support user-created custom Encoders for Datasets

2017-10-25 Thread Adamos Loizou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218323#comment-16218323
 ] 

Adamos Loizou commented on SPARK-22351:
---

Hi [~srowen], yes. The problem is creating the encoder in the first place.
i.e. How can you create an Encoder in Spark 2.x that can serialize a joda 
LocalDate to a Timestamp?
I had a look at the {{ExpressionEncoder}} which is the _only_ implementation 
allowed and it was non trivial.
If you believe it relatively easy please share.

Thanks.

> Support user-created custom Encoders for Datasets
> -
>
> Key: SPARK-22351
> URL: https://issues.apache.org/jira/browse/SPARK-22351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adamos Loizou
>
> It would be very helpful if we could easily support creating custom encoders 
> for classes in Spark SQL.
> This is to allow a user to properly define a business model using types of 
> their choice. They can then map them to Spark SQL types without being forced 
> to pollute their model with the built-in mappable types (e.g. 
> {{java.sql.Timestamp}}).
> Specifically in our case, we tend to use either the Java 8 time API or the 
> joda time API for dates instead of {{java.sql.Timestamp}} whose API is quite 
> limited compared to the others.
> Ideally we would like to be able to have a dataset of such a class:
> {code:java}
> case class Person(name: String, dateOfBirth: org.joda.time.LocalDate)
> implicit def localDateTimeEncoder: Encoder[LocalDate] = ??? // we define 
> something that maps to Spark SQL TimestampType
> ...
> // read csv and map it to model
> val people:Dataset[Person] = spark.read.csv("/my/path/file.csv").as[Person]
> {code}
> While this was possible in Spark 1.6 it's not longer the case in Spark 2.x.
> It's also not straight forward as to how to support that using an 
> {{ExpressionEncoder}} (any tips would be much appreciated)
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22351) Support user-created custom Encoders for Datasets

2017-10-25 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218318#comment-16218318
 ] 

Sean Owen commented on SPARK-22351:
---

In Dataset, you can see many APIs accepting an Encoder. Have you tried those?

> Support user-created custom Encoders for Datasets
> -
>
> Key: SPARK-22351
> URL: https://issues.apache.org/jira/browse/SPARK-22351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adamos Loizou
>
> It would be very helpful if we could easily support creating custom encoders 
> for classes in Spark SQL.
> This is to allow a user to properly define a business model using types of 
> their choice. They can then map them to Spark SQL types without being forced 
> to pollute their model with the built-in mappable types (e.g. 
> {{java.sql.Timestamp}}).
> Specifically in our case, we tend to use either the Java 8 time API or the 
> joda time API for dates instead of {{java.sql.Timestamp}} whose API is quite 
> limited compared to the others.
> Ideally we would like to be able to have a dataset of such a class:
> {code:java}
> case class Person(name: String, dateOfBirth: org.joda.time.LocalDate)
> implicit def localDateTimeEncoder: Encoder[LocalDate] = ??? // we define 
> something that maps to Spark SQL TimestampType
> ...
> // read csv and map it to model
> val people:Dataset[Person] = spark.read.csv("/my/path/file.csv").as[Person]
> {code}
> While this was possible in Spark 1.6 it's not longer the case in Spark 2.x.
> It's also not straight forward as to how to support that using an 
> {{ExpressionEncoder}} (any tips would be much appreciated)
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org