[jira] [Commented] (SPARK-17248) Add native Scala enum support to Dataset Encoders
[ https://issues.apache.org/jira/browse/SPARK-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519214#comment-16519214 ] Angel Cervera Claudio commented on SPARK-17248: --- [~dongjin] suggestion is really good. More examples to clarify the use of enums and alternatives would be great. Also, a link to these examples from the tunning page. > Add native Scala enum support to Dataset Encoders > - > > Key: SPARK-17248 > URL: https://issues.apache.org/jira/browse/SPARK-17248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Silvio Fiorito >Priority: Major > > Enable support for Scala enums in Encoders. Ideally, users should be able to > use enums as part of case classes automatically. > Currently, this code... > {code} > object MyEnum extends Enumeration { > type MyEnum = Value > val EnumVal1, EnumVal2 = Value > } > case class MyClass(col: MyEnum.MyEnum) > val data = Seq(MyClass(MyEnum.EnumVal1), MyClass(MyEnum.EnumVal2)).toDS() > {code} > ...results in this stacktrace: > {code} > ava.lang.UnsupportedOperationException: No Encoder found for MyEnum.MyEnum > - field (class: "scala.Enumeration.Value", name: "col") > - root class: > "line550c9f34c5144aa1a1e76bcac863244717.$read.$iwC.$iwC.$iwC.$iwC.MyClass" > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61) > at org.apache.spark.sql.Encoders$.product(Encoders.scala:274) > at > org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17248) Add native Scala enum support to Dataset Encoders
[ https://issues.apache.org/jira/browse/SPARK-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903245#comment-15903245 ] Lee Dongjin commented on SPARK-17248: - [~pdxleif] // Although it may be an expired question, let me answer. There are two ways of implementing Enum types in Scala. For more information, see: http://alvinalexander.com/scala/how-to-use-scala-enums-enumeration-examples The 'enumeration object' in the tuning guide seems to point the second way, that is, using sealed traits and case objects. However, it seems like the sentence "Consider using numeric IDs or enumeration objects instead of strings for keys." in the tuning guide does not apply to DataSet, like your case. From my humble knowledge, DataSet supports only case classes with primitive types, not with the other case class or object, in this case, the enum objects. If you want some workaround, please check out this example: https://github.com/dongjinleekr/spark-dataset/blob/master/src/main/scala/com/github/dongjinleekr/spark/dataset/Titanic.scala It shows an example of DataSet using one of the public datasets. Please pay attention to how I matched the Passenger case class and its corresponding SchemaType - age, pClass, sex and embarked. [~srowen] // It seems like lots of users are experiencing similar problems. How about changing this issue into providing more examples and explanations in official documentation? I needed, I would like to take the issue. > Add native Scala enum support to Dataset Encoders > - > > Key: SPARK-17248 > URL: https://issues.apache.org/jira/browse/SPARK-17248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Silvio Fiorito > > Enable support for Scala enums in Encoders. Ideally, users should be able to > use enums as part of case classes automatically. > Currently, this code... > {code} > object MyEnum extends Enumeration { > type MyEnum = Value > val EnumVal1, EnumVal2 = Value > } > case class MyClass(col: MyEnum.MyEnum) > val data = Seq(MyClass(MyEnum.EnumVal1), MyClass(MyEnum.EnumVal2)).toDS() > {code} > ...results in this stacktrace: > {code} > ava.lang.UnsupportedOperationException: No Encoder found for MyEnum.MyEnum > - field (class: "scala.Enumeration.Value", name: "col") > - root class: > "line550c9f34c5144aa1a1e76bcac863244717.$read.$iwC.$iwC.$iwC.$iwC.MyClass" > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61) > at org.apache.spark.sql.Encoders$.product(Encoders.scala:274) > at > org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17248) Add native Scala enum support to Dataset Encoders
[ https://issues.apache.org/jira/browse/SPARK-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836374#comment-15836374 ] Leif Warner commented on SPARK-17248: - The spark documentation itself says "Consider using numeric IDs or enumeration objects instead of strings for keys." - https://spark.apache.org/docs/latest/tuning.html What is meant by "enumeration objects" there? > Add native Scala enum support to Dataset Encoders > - > > Key: SPARK-17248 > URL: https://issues.apache.org/jira/browse/SPARK-17248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Silvio Fiorito > > Enable support for Scala enums in Encoders. Ideally, users should be able to > use enums as part of case classes automatically. > Currently, this code... > {code} > object MyEnum extends Enumeration { > type MyEnum = Value > val EnumVal1, EnumVal2 = Value > } > case class MyClass(col: MyEnum.MyEnum) > val data = Seq(MyClass(MyEnum.EnumVal1), MyClass(MyEnum.EnumVal2)).toDS() > {code} > ...results in this stacktrace: > {code} > ava.lang.UnsupportedOperationException: No Encoder found for MyEnum.MyEnum > - field (class: "scala.Enumeration.Value", name: "col") > - root class: > "line550c9f34c5144aa1a1e76bcac863244717.$read.$iwC.$iwC.$iwC.$iwC.MyClass" > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61) > at org.apache.spark.sql.Encoders$.product(Encoders.scala:274) > at > org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17248) Add native Scala enum support to Dataset Encoders
[ https://issues.apache.org/jira/browse/SPARK-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832163#comment-15832163 ] Leif Warner commented on SPARK-17248: - Much more efficient encodings than strings are possible with enums. That's a big reason I was looking to encode things as enums (or the Scala sum type equivalent, a sealed abstract class with a few case objects that extend it). Consider: {code} object Bool extends Enumeration { val True, False = Value } {code} It'd be a lot more efficient to encode a 1 or 0 instead of the strings "True" or "False" every time. > Add native Scala enum support to Dataset Encoders > - > > Key: SPARK-17248 > URL: https://issues.apache.org/jira/browse/SPARK-17248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Silvio Fiorito > > Enable support for Scala enums in Encoders. Ideally, users should be able to > use enums as part of case classes automatically. > Currently, this code... > {code} > object MyEnum extends Enumeration { > type MyEnum = Value > val EnumVal1, EnumVal2 = Value > } > case class MyClass(col: MyEnum.MyEnum) > val data = Seq(MyClass(MyEnum.EnumVal1), MyClass(MyEnum.EnumVal2)).toDS() > {code} > ...results in this stacktrace: > {code} > ava.lang.UnsupportedOperationException: No Encoder found for MyEnum.MyEnum > - field (class: "scala.Enumeration.Value", name: "col") > - root class: > "line550c9f34c5144aa1a1e76bcac863244717.$read.$iwC.$iwC.$iwC.$iwC.MyClass" > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61) > at org.apache.spark.sql.Encoders$.product(Encoders.scala:274) > at > org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17248) Add native Scala enum support to Dataset Encoders
[ https://issues.apache.org/jira/browse/SPARK-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438595#comment-15438595 ] Sean Owen commented on SPARK-17248: --- As a workaround, just encode these as Strings. I imagine that's how it would be implemented anyway. > Add native Scala enum support to Dataset Encoders > - > > Key: SPARK-17248 > URL: https://issues.apache.org/jira/browse/SPARK-17248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Silvio Fiorito > > Enable support for Scala enums in Encoders. Ideally, users should be able to > use enums as part of case classes automatically. > Currently, this code... > {code} > object MyEnum extends Enumeration { > type MyEnum = Value > val EnumVal1, EnumVal2 = Value > } > case class MyClass(col: MyEnum.MyEnum) > val data = Seq(MyClass(MyEnum.EnumVal1), MyClass(MyEnum.EnumVal2)).toDS() > {code} > ...results in this stacktrace: > {code} > ava.lang.UnsupportedOperationException: No Encoder found for MyEnum.MyEnum > - field (class: "scala.Enumeration.Value", name: "col") > - root class: > "line550c9f34c5144aa1a1e76bcac863244717.$read.$iwC.$iwC.$iwC.$iwC.MyClass" > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61) > at org.apache.spark.sql.Encoders$.product(Encoders.scala:274) > at > org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org