[ 
https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974404#comment-15974404
 ] 

Daniel Davis commented on SPARK-20384:
--------------------------------------

The bugs SPARK-19741 and SPARK-17368 are about specific errors when having 
value classes inside a DataSet. This ticket is about giving full support for 
value classes around columns with primitive types, like mapping primitives to 
value classes and using the methods defined on the value classes for things 
like {{ds.show()}} or {{ds.col("id") > Id(5)}}.

Currently, when mapping the example a AnalysisException is raised:
{noformat}
org.apache.spark.sql.AnalysisException: Can't extract value from id#463L;
  at 
org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$resolveExpression$1.applyOrElse(Analyzer.scala:704)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$resolveExpression$1.applyOrElse(Analyzer.scala:699)
  ...
{noformat}

> supporting value classes over primitives in DataSets
> ----------------------------------------------------
>
>                 Key: SPARK-20384
>                 URL: https://issues.apache.org/jira/browse/SPARK-20384
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer, SQL
>    Affects Versions: 2.1.0
>            Reporter: Daniel Davis
>            Priority: Minor
>
> As a spark user who uses value classes in scala for modelling domain objects, 
> I also would like to make use of them for datasets. 
> For example, I would like to use the {{User}} case class which is using a 
> value-class for it's {{id}} as the type for a DataSet:
> - the underlying primitive should be mapped to the value-class column
> - function on the column (for example comparison ) should only work if 
> defined on the value-class and use these implementation
> - show() should pick up the toString method of the value-class
> {code}
> case class Id(value: Long) extends AnyVal {
>   def toString: String = value.toHexString
> }
> case class User(id: Id, name: String)
> spark.sparkContext
>   .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS()
>   .withColumnRenamed("_1", "id")
>   .withColumnRenamed("_2", "name")
>   .as[User].show()
> {code}
> expected output:
> {noformat}
> +---+-------+
> | id|   name|
> +---+-------+
> |  0| name-0|
> |  1| name-1|
> |  2| name-2|
> |  3| name-3|
> |  4| name-4|
> |  5| name-5|
> |  6| name-6|
> |  7| name-7|
> |  8| name-8|
> |  9| name-9|
> |  A|name-10|
> |  B|name-11|
> |  C|name-12|
> +---+-------+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to