[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19664


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19664#discussion_r149857257
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite {
 assert(linkedHashMapDeserializer.dataType == 
ObjectType(classOf[LHMap[_, _]]))
   }
 
+  test("SPARK-22442: Generate correct field names for special characters") 
{
+val serializer = serializerFor[SpecialCharAsFieldData](BoundReference(
+  0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false))
+val deserializer = deserializerFor[SpecialCharAsFieldData]
+assert(serializer.dataType(0).name == "field.1")
+assert(serializer.dataType(1).name == "field 2")
+
+val argumentsFields = 
deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect {
+  case UpCast(u: UnresolvedAttribute, _, _) => u.name
--- End diff --

Ok. Reasonable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19664#discussion_r149633100
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite {
 assert(linkedHashMapDeserializer.dataType == 
ObjectType(classOf[LHMap[_, _]]))
   }
 
+  test("SPARK-22442: Generate correct field names for special characters") 
{
+val serializer = serializerFor[SpecialCharAsFieldData](BoundReference(
+  0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false))
+val deserializer = deserializerFor[SpecialCharAsFieldData]
+assert(serializer.dataType(0).name == "field.1")
+assert(serializer.dataType(1).name == "field 2")
+
+val argumentsFields = 
deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect {
+  case UpCast(u: UnresolvedAttribute, _, _) => u.name
--- End diff --

actually we should collect `u.nameParts` here. `u.name` adds backticks if 
the `nameParts` contains dot, which makes the test result a little confusing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19664#discussion_r149565144
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -214,11 +215,13 @@ case class Invoke(
   override def eval(input: InternalRow): Any =
 throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
 
+  private lazy val encodedFunctionName = 
TermName(functionName).encodedName.toString
--- End diff --

Since we use `Invoke` to access field(s) in object, this can be an issue. I 
didn't found `StaticInvoke` used similarly. So it should be fine. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19664#discussion_r149564523
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite {
 assert(linkedHashMapDeserializer.dataType == 
ObjectType(classOf[LHMap[_, _]]))
   }
 
+  test("SPARK-22442: Generate correct field names for special characters") 
{
+val serializer = serializerFor[SpecialCharAsFieldData](BoundReference(
+  0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false))
+val deserializer = deserializerFor[SpecialCharAsFieldData]
+assert(serializer.dataType(0).name == "field.1")
+assert(serializer.dataType(1).name == "field 2")
+
+val argumentsFields = 
deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect {
+  case UpCast(u: UnresolvedAttribute, _, _) => u.name
+}}
+assert(argumentsFields(0) == "`field.1`")
--- End diff --

We need to deliberately wrap backticks around a field name such as 
`field.1` because of the dot character. Otherwise `UnresolvedAttribute` will 
parse it as two name parts `Seq("field", "1")` and then fail resolving later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19664#discussion_r149564330
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -214,11 +215,13 @@ case class Invoke(
   override def eval(input: InternalRow): Any =
 throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
 
+  private lazy val encodedFunctionName = 
TermName(functionName).encodedName.toString
--- End diff --

Maybe, although I didn't have concrete case causing the issue for now. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19664#discussion_r149316667
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -335,4 +338,17 @@ class ScalaReflectionSuite extends SparkFunSuite {
 assert(linkedHashMapDeserializer.dataType == 
ObjectType(classOf[LHMap[_, _]]))
   }
 
+  test("SPARK-22442: Generate correct field names for special characters") 
{
+val serializer = serializerFor[SpecialCharAsFieldData](BoundReference(
+  0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false))
+val deserializer = deserializerFor[SpecialCharAsFieldData]
+assert(serializer.dataType(0).name == "field.1")
+assert(serializer.dataType(1).name == "field 2")
+
+val argumentsFields = 
deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect {
+  case UpCast(u: UnresolvedAttribute, _, _) => u.name
+}}
+assert(argumentsFields(0) == "`field.1`")
--- End diff --

why it has backticks?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19664#discussion_r149316254
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -214,11 +215,13 @@ case class Invoke(
   override def eval(input: InternalRow): Any =
 throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
 
+  private lazy val encodedFunctionName = 
TermName(functionName).encodedName.toString
--- End diff --

does `StaticInvoke` have some issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...

2017-11-06 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/19664

[SPARK-22442][SQL] ScalaReflection should produce correct field names for 
special characters

## What changes were proposed in this pull request?

For a class with field name of special characters, e.g.:
```scala
case class MyType(`field.1`: String, `field 2`: String)
```

Although we can manipulate DataFrame/Dataset, the field names are encoded:
```scala
scala> val df = Seq(MyType("a", "b"), MyType("c", "d")).toDF
df: org.apache.spark.sql.DataFrame = [field$u002E1: string, field$u00202: 
string]
scala> df.as[MyType].collect
res7: Array[MyType] = Array(MyType(a,b), MyType(c,d))
```

It causes resolving problem when we try to convert the data with 
non-encoded field names:
```scala
spark.read.json(path).as[MyType]
```

We should use decoded field name.

## How was this patch tested?

Added unit test. May add another end-to-end test later.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-22442

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19664.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19664


commit 319c80447c4fc1baa3167c889d1d8c072ee5b31c
Author: Liang-Chi Hsieh 
Date:   2017-11-06T09:04:52Z

ScalaReflection should produce correct field names for special characters.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org