[GitHub] spark pull request #22187: [SPARK-25178][SQL] change the generated code of t...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22187#discussion_r212075572 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -44,31 +44,19 @@ class RowBasedHashMapGenerator( groupingKeySchema, bufferSchema) { override protected def initializeAggregateHashMap(): String = { -val generatedKeySchema: String = - s"new org.apache.spark.sql.types.StructType()" + -groupingKeySchema.map { key => - val keyName = ctx.addReferenceObj("keyName", key.name) - key.dataType match { -case d: DecimalType => - s""".add($keyName, org.apache.spark.sql.types.DataTypes.createDecimalType( - |${d.precision}, ${d.scale}))""".stripMargin -case _ => - s""".add($keyName, org.apache.spark.sql.types.DataTypes.${key.dataType})""" - } -}.mkString("\n").concat(";") +val generatedKeyColTypes = groupingKeySchema + .zipWithIndex.map { case (t, i) => (s"_col$i", t.dataType) } +val generatedKeySchemaTypes = generatedKeyColTypes + .foldLeft(new StructType())((schema, colType) => schema.add(colType._1, colType._2)) --- End diff -- Ah, we do not need recreate the structure type. I think we do not need `.asNullable`. Furthermore, I think that we do not need `generatedAggBufferColTypes`. We need only # of its fields. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22187: [SPARK-25178][SQL] change the generated code of t...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22187#discussion_r212062744 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -44,31 +44,19 @@ class RowBasedHashMapGenerator( groupingKeySchema, bufferSchema) { override protected def initializeAggregateHashMap(): String = { -val generatedKeySchema: String = - s"new org.apache.spark.sql.types.StructType()" + -groupingKeySchema.map { key => - val keyName = ctx.addReferenceObj("keyName", key.name) - key.dataType match { -case d: DecimalType => - s""".add($keyName, org.apache.spark.sql.types.DataTypes.createDecimalType( - |${d.precision}, ${d.scale}))""".stripMargin -case _ => - s""".add($keyName, org.apache.spark.sql.types.DataTypes.${key.dataType})""" - } -}.mkString("\n").concat(";") +val generatedKeyColTypes = groupingKeySchema + .zipWithIndex.map { case (t, i) => (s"_col$i", t.dataType) } +val generatedKeySchemaTypes = generatedKeyColTypes + .foldLeft(new StructType())((schema, colType) => schema.add(colType._1, colType._2)) --- End diff -- Btw, do we need to recreate the struct type with dummy names if we use `ctx.addReferenceObj()`? We might need to do `.asNullable`, though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22187: [SPARK-25178][SQL] change the generated code of t...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22187#discussion_r212062022 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -44,31 +44,19 @@ class RowBasedHashMapGenerator( groupingKeySchema, bufferSchema) { override protected def initializeAggregateHashMap(): String = { -val generatedKeySchema: String = - s"new org.apache.spark.sql.types.StructType()" + -groupingKeySchema.map { key => - val keyName = ctx.addReferenceObj("keyName", key.name) - key.dataType match { -case d: DecimalType => - s""".add($keyName, org.apache.spark.sql.types.DataTypes.createDecimalType( - |${d.precision}, ${d.scale}))""".stripMargin -case _ => - s""".add($keyName, org.apache.spark.sql.types.DataTypes.${key.dataType})""" - } -}.mkString("\n").concat(";") +val generatedKeyColTypes = groupingKeySchema + .zipWithIndex.map { case (t, i) => (s"_col$i", t.dataType) } +val generatedKeySchemaTypes = generatedKeyColTypes + .foldLeft(new StructType())((schema, colType) => schema.add(colType._1, colType._2)) --- End diff -- How about: ```scala val generatedKeyFields = groupingKeySchema .zipWithIndex.map { case (t, i) => StructField(s"_col$i", t.dataType) } val generatedKeySchemaTypes = new StructType(generatedKeyFields) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22187: [SPARK-25178][SQL] change the generated code of t...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/22187 [SPARK-25178][SQL] change the generated code of the keySchema / valueSchema for xxxHashMapGenerator ## What changes were proposed in this pull request? This PR generates the code that to refer a `StructType` generated in the scala code instead of generating `StructType` in Java code. This solves two issues. 1. Avoid to used the field name such as `key.name` 1. Support complicated schema (e.g. nested DataType) Originally, [the JIRA entry](https://issues.apache.org/jira/browse/SPARK-25178) proposed to change the generated field name of the keySchema / valueSchema to a dummy name in `RowBasedHashMapGenerator` and `VectorizedHashMapGenerator.scala`. @Ueshin suggested to refer to a `StructType` generated in the scala code using `ctx.addReferenceObj()`. ## How was this patch tested? Existing UTs You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-25178 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22187.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22187 commit 0626de74f726622ac3eb251fc9f66aaa3de002d3 Author: Kazuaki Ishizaki Date: 2018-08-22T18:10:24Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org