[ https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580498#comment-16580498 ]
Minh Thai commented on SPARK-20384: ----------------------------------- _(from my comment in SPARK-17368)_ I think the main problem is there was no way to create an encoder specifically for value classes even until today. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means: - Any user-defined value class has to mixin {{OpaqueValue}} - An encoder can be created to target those value classes. {code:java} trait OpaqueValue extends Any implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ??? case class Id(value: Int) extends AnyVal with OpaqueValue {code} this doesn't clash with the existing encoder for case class since the type constraint is more specific {code:java} implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T] {code} _I'm experimenting with this on my fork and will make a PR if it works well._ _(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_ > supporting value classes over primitives in DataSets > ---------------------------------------------------- > > Key: SPARK-20384 > URL: https://issues.apache.org/jira/browse/SPARK-20384 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL > Affects Versions: 2.1.0 > Reporter: Daniel Davis > Priority: Minor > > As a spark user who uses value classes in scala for modelling domain objects, > I also would like to make use of them for datasets. > For example, I would like to use the {{User}} case class which is using a > value-class for it's {{id}} as the type for a DataSet: > - the underlying primitive should be mapped to the value-class column > - function on the column (for example comparison ) should only work if > defined on the value-class and use these implementation > - show() should pick up the toString method of the value-class > {code} > case class Id(value: Long) extends AnyVal { > def toString: String = value.toHexString > } > case class User(id: Id, name: String) > val ds = spark.sparkContext > .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS() > .withColumnRenamed("_1", "id") > .withColumnRenamed("_2", "name") > // mapping should work > val usrs = ds.as[User] > // show should use toString > usrs.show() > // comparison with long should throw exception, as not defined on Id > usrs.col("id") > 0L > {code} > For example `.show()` should use the toString of the `Id` value class: > {noformat} > +---+-------+ > | id| name| > +---+-------+ > | 0| name-0| > | 1| name-1| > | 2| name-2| > | 3| name-3| > | 4| name-4| > | 5| name-5| > | 6| name-6| > | 7| name-7| > | 8| name-8| > | 9| name-9| > | A|name-10| > | B|name-11| > | C|name-12| > +---+-------+ > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org