[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19664 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149857257 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite { assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]])) } + test("SPARK-22442: Generate correct field names for special characters") { +val serializer = serializerFor[SpecialCharAsFieldData](BoundReference( + 0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false)) +val deserializer = deserializerFor[SpecialCharAsFieldData] +assert(serializer.dataType(0).name == "field.1") +assert(serializer.dataType(1).name == "field 2") + +val argumentsFields = deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect { + case UpCast(u: UnresolvedAttribute, _, _) => u.name --- End diff -- Ok. Reasonable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149633100 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite { assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]])) } + test("SPARK-22442: Generate correct field names for special characters") { +val serializer = serializerFor[SpecialCharAsFieldData](BoundReference( + 0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false)) +val deserializer = deserializerFor[SpecialCharAsFieldData] +assert(serializer.dataType(0).name == "field.1") +assert(serializer.dataType(1).name == "field 2") + +val argumentsFields = deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect { + case UpCast(u: UnresolvedAttribute, _, _) => u.name --- End diff -- actually we should collect `u.nameParts` here. `u.name` adds backticks if the `nameParts` contains dot, which makes the test result a little confusing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149565144 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -214,11 +215,13 @@ case class Invoke( override def eval(input: InternalRow): Any = throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + private lazy val encodedFunctionName = TermName(functionName).encodedName.toString --- End diff -- Since we use `Invoke` to access field(s) in object, this can be an issue. I didn't found `StaticInvoke` used similarly. So it should be fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149564523 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite { assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]])) } + test("SPARK-22442: Generate correct field names for special characters") { +val serializer = serializerFor[SpecialCharAsFieldData](BoundReference( + 0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false)) +val deserializer = deserializerFor[SpecialCharAsFieldData] +assert(serializer.dataType(0).name == "field.1") +assert(serializer.dataType(1).name == "field 2") + +val argumentsFields = deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect { + case UpCast(u: UnresolvedAttribute, _, _) => u.name +}} +assert(argumentsFields(0) == "`field.1`") --- End diff -- We need to deliberately wrap backticks around a field name such as `field.1` because of the dot character. Otherwise `UnresolvedAttribute` will parse it as two name parts `Seq("field", "1")` and then fail resolving later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149564330 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -214,11 +215,13 @@ case class Invoke( override def eval(input: InternalRow): Any = throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + private lazy val encodedFunctionName = TermName(functionName).encodedName.toString --- End diff -- Maybe, although I didn't have concrete case causing the issue for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149316667 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite { assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]])) } + test("SPARK-22442: Generate correct field names for special characters") { +val serializer = serializerFor[SpecialCharAsFieldData](BoundReference( + 0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false)) +val deserializer = deserializerFor[SpecialCharAsFieldData] +assert(serializer.dataType(0).name == "field.1") +assert(serializer.dataType(1).name == "field 2") + +val argumentsFields = deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect { + case UpCast(u: UnresolvedAttribute, _, _) => u.name +}} +assert(argumentsFields(0) == "`field.1`") --- End diff -- why it has backticks? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19664#discussion_r149316254 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -214,11 +215,13 @@ case class Invoke( override def eval(input: InternalRow): Any = throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + private lazy val encodedFunctionName = TermName(functionName).encodedName.toString --- End diff -- does `StaticInvoke` have some issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/19664 [SPARK-22442][SQL] ScalaReflection should produce correct field names for special characters ## What changes were proposed in this pull request? For a class with field name of special characters, e.g.: ```scala case class MyType(`field.1`: String, `field 2`: String) ``` Although we can manipulate DataFrame/Dataset, the field names are encoded: ```scala scala> val df = Seq(MyType("a", "b"), MyType("c", "d")).toDF df: org.apache.spark.sql.DataFrame = [field$u002E1: string, field$u00202: string] scala> df.as[MyType].collect res7: Array[MyType] = Array(MyType(a,b), MyType(c,d)) ``` It causes resolving problem when we try to convert the data with non-encoded field names: ```scala spark.read.json(path).as[MyType] ``` We should use decoded field name. ## How was this patch tested? Added unit test. May add another end-to-end test later. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-22442 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19664.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19664 commit 319c80447c4fc1baa3167c889d1d8c072ee5b31c Author: Liang-Chi Hsieh Date: 2017-11-06T09:04:52Z ScalaReflection should produce correct field names for special characters. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org