[ https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418691#comment-16418691 ]
Furcy Pin edited comment on SPARK-20384 at 3/29/18 10:10 AM: ------------------------------------------------------------- +1 on this issue. I think the generic use case is that the spark Encoder magic to automatically transform a DataFrame into a case class currently only work for base types. This is great if you have a {code:java} case class Table(id: Long, attribute: String) {code} with simple attributes, BUT, if you want to wrap your attribute into another simple class like this {code:java} case class Attribute(value: String) { // some specific methods... } case class Table(id: Long, attribute: Attribute){code} Then this won't work automatically, unless the "attribute" column in your DataFrame is a struct itself. The problem is that currently there doesn't seem to be any simple way to achieve this, which really limits the usefulness of the whole Encoder magic. And if a nice, simple way to achieve this exists, please document it as I did not find it. EDIT: after giving it some thought, I tried to do this: {code:java} implicit class Attribute(value: String) case class Table(id: Long, attribute: Attribute){code} But it does not work either. If it were possible like this, it would be a nice way to do it. was (Author: fpin): +1 on this issue. I think the generic use case is that the spark Encoder magic to automatically transform a DataFrame into a case class currently only work for base types. This is great if you have a {code:java} case class Table(id: Long, attribute: String) {code} with simple attributes, BUT, if you want to wrap your attribute into another simple class like this {code:java} case class Attribute(value: String) { // some specific methods... } case class Table(id: Long, attribute: Attribute){code} Then this won't work automatically, unless the "attribute" column in your DataFrame is a struct itself. The problem is that currently there doesn't seem to be any simple way to achieve this, which really limits the usefulness of the whole Encoder magic. And if a nice, simple way to achieve this exists, please document it as I did not find it. > supporting value classes over primitives in DataSets > ---------------------------------------------------- > > Key: SPARK-20384 > URL: https://issues.apache.org/jira/browse/SPARK-20384 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL > Affects Versions: 2.1.0 > Reporter: Daniel Davis > Priority: Minor > > As a spark user who uses value classes in scala for modelling domain objects, > I also would like to make use of them for datasets. > For example, I would like to use the {{User}} case class which is using a > value-class for it's {{id}} as the type for a DataSet: > - the underlying primitive should be mapped to the value-class column > - function on the column (for example comparison ) should only work if > defined on the value-class and use these implementation > - show() should pick up the toString method of the value-class > {code} > case class Id(value: Long) extends AnyVal { > def toString: String = value.toHexString > } > case class User(id: Id, name: String) > val ds = spark.sparkContext > .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS() > .withColumnRenamed("_1", "id") > .withColumnRenamed("_2", "name") > // mapping should work > val usrs = ds.as[User] > // show should use toString > usrs.show() > // comparison with long should throw exception, as not defined on Id > usrs.col("id") > 0L > {code} > For example `.show()` should use the toString of the `Id` value class: > {noformat} > +---+-------+ > | id| name| > +---+-------+ > | 0| name-0| > | 1| name-1| > | 2| name-2| > | 3| name-3| > | 4| name-4| > | 5| name-5| > | 6| name-6| > | 7| name-7| > | 8| name-8| > | 9| name-9| > | A|name-10| > | B|name-11| > | C|name-12| > +---+-------+ > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org