[GitHub] spark pull request #22187: [SPARK-25178][SQL] change the generated code of t...

2018-08-22 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22187#discussion_r212075572
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -44,31 +44,19 @@ class RowBasedHashMapGenerator(
 groupingKeySchema, bufferSchema) {
 
   override protected def initializeAggregateHashMap(): String = {
-val generatedKeySchema: String =
-  s"new org.apache.spark.sql.types.StructType()" +
-groupingKeySchema.map { key =>
-  val keyName = ctx.addReferenceObj("keyName", key.name)
-  key.dataType match {
-case d: DecimalType =>
-  s""".add($keyName, 
org.apache.spark.sql.types.DataTypes.createDecimalType(
-  |${d.precision}, ${d.scale}))""".stripMargin
-case _ =>
-  s""".add($keyName, 
org.apache.spark.sql.types.DataTypes.${key.dataType})"""
-  }
-}.mkString("\n").concat(";")
+val generatedKeyColTypes = groupingKeySchema
+  .zipWithIndex.map { case (t, i) => (s"_col$i", t.dataType) }
+val generatedKeySchemaTypes = generatedKeyColTypes
+  .foldLeft(new StructType())((schema, colType) => 
schema.add(colType._1, colType._2))
--- End diff --

Ah, we do not need recreate the structure type. I think we do not need 
`.asNullable`.

Furthermore, I think that we do not need `generatedAggBufferColTypes`. We 
need only # of its fields.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22187: [SPARK-25178][SQL] change the generated code of t...

2018-08-22 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22187#discussion_r212062744
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -44,31 +44,19 @@ class RowBasedHashMapGenerator(
 groupingKeySchema, bufferSchema) {
 
   override protected def initializeAggregateHashMap(): String = {
-val generatedKeySchema: String =
-  s"new org.apache.spark.sql.types.StructType()" +
-groupingKeySchema.map { key =>
-  val keyName = ctx.addReferenceObj("keyName", key.name)
-  key.dataType match {
-case d: DecimalType =>
-  s""".add($keyName, 
org.apache.spark.sql.types.DataTypes.createDecimalType(
-  |${d.precision}, ${d.scale}))""".stripMargin
-case _ =>
-  s""".add($keyName, 
org.apache.spark.sql.types.DataTypes.${key.dataType})"""
-  }
-}.mkString("\n").concat(";")
+val generatedKeyColTypes = groupingKeySchema
+  .zipWithIndex.map { case (t, i) => (s"_col$i", t.dataType) }
+val generatedKeySchemaTypes = generatedKeyColTypes
+  .foldLeft(new StructType())((schema, colType) => 
schema.add(colType._1, colType._2))
--- End diff --

Btw, do we need to recreate the struct type with dummy names if we use 
`ctx.addReferenceObj()`? We might need to do `.asNullable`, though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22187: [SPARK-25178][SQL] change the generated code of t...

2018-08-22 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22187#discussion_r212062022
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala
 ---
@@ -44,31 +44,19 @@ class RowBasedHashMapGenerator(
 groupingKeySchema, bufferSchema) {
 
   override protected def initializeAggregateHashMap(): String = {
-val generatedKeySchema: String =
-  s"new org.apache.spark.sql.types.StructType()" +
-groupingKeySchema.map { key =>
-  val keyName = ctx.addReferenceObj("keyName", key.name)
-  key.dataType match {
-case d: DecimalType =>
-  s""".add($keyName, 
org.apache.spark.sql.types.DataTypes.createDecimalType(
-  |${d.precision}, ${d.scale}))""".stripMargin
-case _ =>
-  s""".add($keyName, 
org.apache.spark.sql.types.DataTypes.${key.dataType})"""
-  }
-}.mkString("\n").concat(";")
+val generatedKeyColTypes = groupingKeySchema
+  .zipWithIndex.map { case (t, i) => (s"_col$i", t.dataType) }
+val generatedKeySchemaTypes = generatedKeyColTypes
+  .foldLeft(new StructType())((schema, colType) => 
schema.add(colType._1, colType._2))
--- End diff --

How about:

```scala
val generatedKeyFields = groupingKeySchema
  .zipWithIndex.map { case (t, i) => StructField(s"_col$i", t.dataType) }
val generatedKeySchemaTypes = new StructType(generatedKeyFields)
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22187: [SPARK-25178][SQL] change the generated code of t...

2018-08-22 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/22187

[SPARK-25178][SQL] change the generated code of the keySchema / valueSchema 
for xxxHashMapGenerator

## What changes were proposed in this pull request?

This PR generates the code that to refer a `StructType` generated in the 
scala code instead of generating `StructType` in Java code. This solves two 
issues.
1. Avoid to used the field name such as `key.name`
1. Support complicated schema (e.g. nested DataType)

Originally, [the JIRA 
entry](https://issues.apache.org/jira/browse/SPARK-25178) proposed to change 
the generated field name of the keySchema / valueSchema to a dummy name in 
`RowBasedHashMapGenerator` and `VectorizedHashMapGenerator.scala`. @Ueshin 
suggested to refer to a `StructType` generated in the scala code using 
`ctx.addReferenceObj()`.

## How was this patch tested?

Existing UTs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-25178

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22187


commit 0626de74f726622ac3eb251fc9f66aaa3de002d3
Author: Kazuaki Ishizaki 
Date:   2018-08-22T18:10:24Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org