Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228716352 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -635,13 +675,17 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } + // as a field, value class is represented by its underlying type + val trueFieldType = + if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) else fieldType + val fieldValue = Invoke( - AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(fieldType), - returnNullable = !fieldType.typeSymbol.asClass.isPrimitive) - val clsName = getClassNameFromType(fieldType) + AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(trueFieldType), + returnNullable = !trueFieldType.typeSymbol.asClass.isPrimitive) + val clsName = getClassNameFromType(trueFieldType) --- End diff -- I tried moving the special logic to the value class case but have a concern I don't know how to resolve yet. I need to change `dataTypeFor` to return `ObjectType` for top level value class and `dataTypeFor(underlyingType)` otherwise (see my [comment](https://github.com/apache/spark/pull/22309#discussion_r226142827)). I'm going with something like this: ```scala private def dataTypeFor(tpe: `Type`, isTopLevelValueClass: Boolean = true) ``` but this isn't right because: - the default value `true` doesn't make sense for other types - if default is `false` or there is no default value, many places that call this method need to be changed - it also feels clunky because `dataTypeFor` now has to be aware of the context of its parameter Do you have any suggestion on this?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org