Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r226142827 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => + // nested value class is treated as its underlying type + // top level value class must be treated as a product + val underlyingType = getUnderlyingTypeOf(t) + val underlyingClsName = getClassNameFromType(underlyingType) + val clsName = t.typeSymbol.asClass.fullName + val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + + val arg = deserializerFor(underlyingType, path, newTypePath) + if (path.isDefined) { + arg --- End diff -- Take class `User` above for example. After compile, field id of type `Id` will become `Int` so when constructing `User` we need `id` to be `Int`. Also why we need `NewInstance` in case `Id` is itself the schema? Because `Id` may remain as `Id` if it is treated as another type (following [allocation rule](https://docs.scala-lang.org/overviews/core/value-classes.html#allocation-details)). For example, in method [encodeDecodeTest](https://github.com/apache/spark/blob/a40806d2bd84e9a0308165f0d6c97e9cf00aa4a3/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala#L373), if we pass an instance of `Id` as input, it will not be converted to `Int`. In the other case when the required type is explicitly `Id`, then both the input and the result returned from deserialization will both become `Int`.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org