[jira] [Commented] (SPARK-17248) Add native Scala enum support to Dataset Encoders

Lee Dongjin (JIRA) Thu, 09 Mar 2017 07:50:53 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903245#comment-15903245
 ]


Lee Dongjin commented on SPARK-17248:
-------------------------------------

[~pdxleif] // Although it may be an expired question, let me answer.

There are two ways of implementing Enum types in Scala. For more information, 
see: 
http://alvinalexander.com/scala/how-to-use-scala-enums-enumeration-examples The 
'enumeration object' in the tuning guide seems to point the second way, that 
is, using sealed traits and case objects.

However, it seems like the sentence "Consider using numeric IDs or enumeration 
objects instead of strings for keys." in the tuning guide does not apply to 
DataSet, like your case. From my humble knowledge, DataSet supports only case 
classes with primitive types, not with the other case class or object, in this 
case, the enum objects.

If you want some workaround, please check out this example: 
https://github.com/dongjinleekr/spark-dataset/blob/master/src/main/scala/com/github/dongjinleekr/spark/dataset/Titanic.scala
 It shows an example of DataSet using one of the public datasets. Please pay 
attention to how I matched the Passenger case class and its corresponding 
SchemaType - age, pClass, sex and embarked.

[~srowen] // It seems like lots of users are experiencing similar problems. How 
about changing this issue into providing more examples and explanations in 
official documentation? I needed, I would like to take the issue.

> Add native Scala enum support to Dataset Encoders
> -------------------------------------------------
>
>                 Key: SPARK-17248
>                 URL: https://issues.apache.org/jira/browse/SPARK-17248
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Silvio Fiorito
>
> Enable support for Scala enums in Encoders. Ideally, users should be able to 
> use enums as part of case classes automatically.
> Currently, this code...
> {code}
> object MyEnum extends Enumeration {
>   type MyEnum = Value
>   val EnumVal1, EnumVal2 = Value
> }
> case class MyClass(col: MyEnum.MyEnum)
> val data = Seq(MyClass(MyEnum.EnumVal1), MyClass(MyEnum.EnumVal2)).toDS()
> {code}
> ...results in this stacktrace:
> {code}
> ava.lang.UnsupportedOperationException: No Encoder found for MyEnum.MyEnum
> - field (class: "scala.Enumeration.Value", name: "col")
> - root class: 
> "line550c9f34c5144aa1a1e76bcac863244717.$read.$iwC.$iwC.$iwC.$iwC.MyClass"
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583)
>       at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>       at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>       at scala.collection.immutable.List.foreach(List.scala:318)
>       at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
>       at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425)
>       at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61)
>       at org.apache.spark.sql.Encoders$.product(Encoders.scala:274)
>       at 
> org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17248) Add native Scala enum support to Dataset Encoders

Reply via email to