[jira] [Issue Comment Deleted] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minh Thai updated SPARK-17368: -- Comment: was deleted (was: [~jodersky] I know that this is an old ticket but I still want to give some comments on making encoder for value classes. Even until today, there is no way to have a type constraint that targets value classes. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means: - Any user-defined value class has to mixin {{OpaqueValue}} - An encoder can be created to target those value classes. {code:java} trait OpaqueValue extends Any implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ??? case class Id(value: Int) extends AnyVal with OpaqueValue {code} tested on my machine using Spark 2.1.0 and Scala 2.11.12, this doesn't clash with the existing encoder for case class {code:java} implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T] {code} If this is possible to implement. I think it can solve SPARK-20384 also. _(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_) > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis >Assignee: Jakob Odersky >Priority: Major > Fix For: 2.1.0 > > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20384) supporting value classes over primitives in DataSets
[ https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580498#comment-16580498 ] Minh Thai commented on SPARK-20384: --- _(from my comment in SPARK-17368)_ I think the main problem is there was no way to create an encoder specifically for value classes even until today. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means: - Any user-defined value class has to mixin {{OpaqueValue}} - An encoder can be created to target those value classes. {code:java} trait OpaqueValue extends Any implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ??? case class Id(value: Int) extends AnyVal with OpaqueValue {code} this doesn't clash with the existing encoder for case class since the type constraint is more specific {code:java} implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T] {code} _I'm experimenting with this on my fork and will make a PR if it works well._ _(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_ > supporting value classes over primitives in DataSets > > > Key: SPARK-20384 > URL: https://issues.apache.org/jira/browse/SPARK-20384 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL >Affects Versions: 2.1.0 >Reporter: Daniel Davis >Priority: Minor > > As a spark user who uses value classes in scala for modelling domain objects, > I also would like to make use of them for datasets. > For example, I would like to use the {{User}} case class which is using a > value-class for it's {{id}} as the type for a DataSet: > - the underlying primitive should be mapped to the value-class column > - function on the column (for example comparison ) should only work if > defined on the value-class and use these implementation > - show() should pick up the toString method of the value-class > {code} > case class Id(value: Long) extends AnyVal { > def toString: String = value.toHexString > } > case class User(id: Id, name: String) > val ds = spark.sparkContext > .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS() > .withColumnRenamed("_1", "id") > .withColumnRenamed("_2", "name") > // mapping should work > val usrs = ds.as[User] > // show should use toString > usrs.show() > // comparison with long should throw exception, as not defined on Id > usrs.col("id") > 0L > {code} > For example `.show()` should use the toString of the `Id` value class: > {noformat} > +---+---+ > | id| name| > +---+---+ > | 0| name-0| > | 1| name-1| > | 2| name-2| > | 3| name-3| > | 4| name-4| > | 5| name-5| > | 6| name-6| > | 7| name-7| > | 8| name-8| > | 9| name-9| > | A|name-10| > | B|name-11| > | C|name-12| > +---+---+ > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577449#comment-16577449 ] Minh Thai edited comment on SPARK-17368 at 8/12/18 8:22 AM: [~jodersky] I know that this is an old ticket but I still want to give some comments on making encoder for value classes. Even until today, there is no way to have a type constraint that targets value classes. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means: - Any user-defined value class has to mixin {{OpaqueValue}} - An encoder can be created to target those value classes. {code:java} trait OpaqueValue extends Any implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ??? case class Id(value: Int) extends AnyVal with OpaqueValue {code} tested on my machine using Spark 2.1.0 and Scala 2.11.12, this doesn't clash with the existing encoder for case class {code:java} implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T] {code} If this is possible to implement. I think it can solve SPARK-20384 also. _(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_ was (Author: mthai): [~jodersky] I know that this is an old ticket but I still want to give some comments on making encoder for value classes. Even until today, there is no way to have a type constraint that targets value classes. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means: - Any user-defined value class has to mixin {{OpaqueValue}} - An encoder can be created to target those value classes. {code:java} trait OpaqueValue extends Any implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ??? case class Id(value: Int) extends AnyVal with OpaqueValue {code} tested on my machine using Spark 2.1.0 and Scala 2.11.12, this doesn't clash with the existing encoder for case class {code:java} implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T] {code} _If this is possible to implement. I think it can solve SPARK-20384 also._ _(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_ > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis >Assignee: Jakob Odersky >Priority: Major > Fix For: 2.1.0 > > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577449#comment-16577449 ] Minh Thai edited comment on SPARK-17368 at 8/12/18 8:21 AM: [~jodersky] I know that this is an old ticket but I still want to give some comments on making encoder for value classes. Even until today, there is no way to have a type constraint that targets value classes. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means: - Any user-defined value class has to mixin {{OpaqueValue}} - An encoder can be created to target those value classes. {code:java} trait OpaqueValue extends Any implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ??? case class Id(value: Int) extends AnyVal with OpaqueValue {code} tested on my machine using Spark 2.1.0 and Scala 2.11.12, this doesn't clash with the existing encoder for case class {code:java} implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T] {code} _If this is possible to implement. I think it can solve SPARK-20384 also._ _(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_ was (Author: mthai): [~jodersky] I know that this is an old ticket but I still want to give some comments on making encoder for value classes. Even until today, there is no way to have a type constraint that targets value classes. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means: - Any user-defined value class has to mixin {{OpaqueValue}} - An encoder can be created to target those value classes. {code} trait OpaqueValue extends Any implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ??? case class Id(value: Int) extends AnyVal with OpaqueValue {code} tested on my machine using Spark 2.1.0 and Scala 2.11.12, this doesn't clash with the existing encoder for case class {code} implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T] {code} _(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_ > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis >Assignee: Jakob Odersky >Priority: Major > Fix For: 2.1.0 > > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577449#comment-16577449 ] Minh Thai edited comment on SPARK-17368 at 8/12/18 8:21 AM: [~jodersky] I know that this is an old ticket but I still want to give some comments on making encoder for value classes. Even until today, there is no way to have a type constraint that targets value classes. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means: - Any user-defined value class has to mixin {{OpaqueValue}} - An encoder can be created to target those value classes. {code:java} trait OpaqueValue extends Any implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ??? case class Id(value: Int) extends AnyVal with OpaqueValue {code} tested on my machine using Spark 2.1.0 and Scala 2.11.12, this doesn't clash with the existing encoder for case class {code:java} implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T] {code} _If this is possible to implement. I think it can solve SPARK-20384 also._ _(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_ was (Author: mthai): [~jodersky] I know that this is an old ticket but I still want to give some comments on making encoder for value classes. Even until today, there is no way to have a type constraint that targets value classes. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means: - Any user-defined value class has to mixin {{OpaqueValue}} - An encoder can be created to target those value classes. {code:java} trait OpaqueValue extends Any implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ??? case class Id(value: Int) extends AnyVal with OpaqueValue {code} tested on my machine using Spark 2.1.0 and Scala 2.11.12, this doesn't clash with the existing encoder for case class {code:java} implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T] {code} _If this is possible to implement. I think it can solve SPARK-20384 also._ _(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_ > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis >Assignee: Jakob Odersky >Priority: Major > Fix For: 2.1.0 > > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime
[ https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577449#comment-16577449 ] Minh Thai commented on SPARK-17368: --- [~jodersky] I know that this is an old ticket but I still want to give some comments on making encoder for value classes. Even until today, there is no way to have a type constraint that targets value classes. However, I think we can make a [universal trait|https://docs.scala-lang.org/overviews/core/value-classes.html] called {{OpaqueValue}}^1^ to be used as an upper type bound in encoder. This means: - Any user-defined value class has to mixin {{OpaqueValue}} - An encoder can be created to target those value classes. {code} trait OpaqueValue extends Any implicit def newValueClassEncoder[T <: Product with OpaqueValue : TypeTag] = ??? case class Id(value: Int) extends AnyVal with OpaqueValue {code} tested on my machine using Spark 2.1.0 and Scala 2.11.12, this doesn't clash with the existing encoder for case class {code} implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T] {code} _(1) the name is inspired from [Opaque Type|https://docs.scala-lang.org/sips/opaque-types.html] feature of Scala 3_ > Scala value classes create encoder problems and break at runtime > > > Key: SPARK-17368 > URL: https://issues.apache.org/jira/browse/SPARK-17368 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.2, 2.0.0 > Environment: JDK 8 on MacOS > Scala 2.11.8 > Spark 2.0.0 >Reporter: Aris Vlasakakis >Assignee: Jakob Odersky >Priority: Major > Fix For: 2.1.0 > > > Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 > and 1.6.X. > This simple Spark 2 application demonstrates that the code will compile, but > will break at runtime with the error. The value class is of course > *FeatureId*, as it extends AnyVal. > {noformat} > Exception in thread "main" java.lang.RuntimeException: Error while encoding: > java.lang.RuntimeException: Couldn't find v on int > assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0 > +- assertnotnull(input[0, int, true], top level non-flat input object).v >+- assertnotnull(input[0, int, true], top level non-flat input object) > +- input[0, int, true]". > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > at > org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421) > {noformat} > Test code for Spark 2.0.0: > {noformat} > import org.apache.spark.sql.{Dataset, SparkSession} > object BreakSpark { > case class FeatureId(v: Int) extends AnyVal > def main(args: Array[String]): Unit = { > val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3)) > val spark = SparkSession.builder.getOrCreate() > import spark.implicits._ > spark.sparkContext.setLogLevel("warn") > val ds: Dataset[FeatureId] = spark.createDataset(seq) > println(s"BREAK HERE: ${ds.count}") > } > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org