[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19878


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19878#discussion_r154845901
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -730,23 +776,29 @@ case class HiveHash(children: Seq[Expression]) 
extends HashExpression[Int] {
   input: String,
   result: String,
   fields: Array[StructField]): String = {
-val localResult = ctx.freshName("localResult")
 val childResult = ctx.freshName("childResult")
-fields.zipWithIndex.map { case (field, index) =>
+val fieldsHash = fields.zipWithIndex.map { case (field, index) =>
+  val computeFieldHash = nullSafeElementHash(
+input, index.toString, field.nullable, field.dataType, 
childResult, ctx)
   s"""
- $childResult = 0;
- ${nullSafeElementHash(input, index.toString, field.nullable, 
field.dataType,
-   childResult, ctx)}
- $localResult = (31 * $localResult) + $childResult;
-   """
-}.mkString(
-  s"""
- int $localResult = 0;
- int $childResult = 0;
-   """,
-  "",
-  s"$result = (31 * $result) + $localResult;"
-)
+ |$childResult = 0;
+ |$computeFieldHash
+ |$result = (31 * $result) + $childResult;
+   """.stripMargin
+}
+
+s"${ctx.JAVA_INT} $childResult = 0;\n" + ctx.splitExpressions(
--- End diff --

Yea, the input here is a row that may be produced by `row.getStruct` 
instead of `ctx.INPUT_ROW`, so we don't need this check as the input won't be 
`ctx.currentVars`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19878#discussion_r154819410
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -730,23 +776,29 @@ case class HiveHash(children: Seq[Expression]) 
extends HashExpression[Int] {
   input: String,
   result: String,
   fields: Array[StructField]): String = {
-val localResult = ctx.freshName("localResult")
 val childResult = ctx.freshName("childResult")
-fields.zipWithIndex.map { case (field, index) =>
+val fieldsHash = fields.zipWithIndex.map { case (field, index) =>
+  val computeFieldHash = nullSafeElementHash(
+input, index.toString, field.nullable, field.dataType, 
childResult, ctx)
   s"""
- $childResult = 0;
- ${nullSafeElementHash(input, index.toString, field.nullable, 
field.dataType,
-   childResult, ctx)}
- $localResult = (31 * $localResult) + $childResult;
-   """
-}.mkString(
-  s"""
- int $localResult = 0;
- int $childResult = 0;
-   """,
-  "",
-  s"$result = (31 * $result) + $localResult;"
-)
+ |$childResult = 0;
+ |$computeFieldHash
+ |$result = (31 * $result) + $childResult;
+   """.stripMargin
+}
+
+s"${ctx.JAVA_INT} $childResult = 0;\n" + ctx.splitExpressions(
--- End diff --

No need to check `ctx.INPUT_ROW == null || ctx.currentVars != null` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19878#discussion_r154819030
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -610,25 +637,44 @@ case class HiveHash(children: Seq[Expression]) 
extends HashExpression[Int] {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 ev.isNull = "false"
+
 val childHash = ctx.freshName("childHash")
-val childrenHash = ctx.splitExpressions(children.map { child =>
+val childrenHash = children.map { child =>
   val childGen = child.genCode(ctx)
   val codeToComputeHash = ctx.nullSafeExec(child.nullable, 
childGen.isNull) {
 computeHash(childGen.value, child.dataType, childHash, ctx)
   }
   s"""
  |${childGen.code}
+ |$childHash = 0;
  |$codeToComputeHash
  |${ev.value} = (31 * ${ev.value}) + $childHash;
- |$childHash = 0;
""".stripMargin
-})
+}
 
-ctx.addMutableState(ctx.javaType(dataType), ev.value)
-ctx.addMutableState(ctx.JAVA_INT, childHash, s"$childHash = 0;")
-ev.copy(code = s"""
-  ${ev.value} = $seed;
-  $childrenHash""")
+val codes = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
+  childrenHash.mkString("\n")
+} else {
+  ctx.splitExpressions(
+expressions = childrenHash,
+funcName = "computeHash",
+arguments = Seq("InternalRow" -> ctx.INPUT_ROW, ctx.JAVA_INT -> 
ev.value),
+returnType = ctx.JAVA_INT,
+makeSplitFunction = body =>
+  s"""
+ |${ctx.JAVA_INT} $childHash = 0;
+ |$body
+ |return ${ev.value};
+   """.stripMargin,
+foldFunctions = _.map(funcCall => s"${ev.value} = 
$funcCall;").mkString("\n"))
+}
+
+ev.copy(code =
+  s"""
+ |${ctx.JAVA_INT} ${ev.value} = $seed;
+ |${ctx.JAVA_INT} $childHash = 0;
--- End diff --

nvm, `splitExpressions` could possibly not split expressions if only one 
block.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19878#discussion_r154818350
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -610,25 +637,44 @@ case class HiveHash(children: Seq[Expression]) 
extends HashExpression[Int] {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 ev.isNull = "false"
+
 val childHash = ctx.freshName("childHash")
-val childrenHash = ctx.splitExpressions(children.map { child =>
+val childrenHash = children.map { child =>
   val childGen = child.genCode(ctx)
   val codeToComputeHash = ctx.nullSafeExec(child.nullable, 
childGen.isNull) {
 computeHash(childGen.value, child.dataType, childHash, ctx)
   }
   s"""
  |${childGen.code}
+ |$childHash = 0;
  |$codeToComputeHash
  |${ev.value} = (31 * ${ev.value}) + $childHash;
- |$childHash = 0;
""".stripMargin
-})
+}
 
-ctx.addMutableState(ctx.javaType(dataType), ev.value)
-ctx.addMutableState(ctx.JAVA_INT, childHash, s"$childHash = 0;")
-ev.copy(code = s"""
-  ${ev.value} = $seed;
-  $childrenHash""")
+val codes = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
+  childrenHash.mkString("\n")
+} else {
+  ctx.splitExpressions(
+expressions = childrenHash,
+funcName = "computeHash",
+arguments = Seq("InternalRow" -> ctx.INPUT_ROW, ctx.JAVA_INT -> 
ev.value),
+returnType = ctx.JAVA_INT,
+makeSplitFunction = body =>
+  s"""
+ |${ctx.JAVA_INT} $childHash = 0;
+ |$body
+ |return ${ev.value};
+   """.stripMargin,
+foldFunctions = _.map(funcCall => s"${ev.value} = 
$funcCall;").mkString("\n"))
+}
+
+ev.copy(code =
+  s"""
+ |${ctx.JAVA_INT} ${ev.value} = $seed;
+ |${ctx.JAVA_INT} $childHash = 0;
--- End diff --

nit: `childHash` is only needed to declare here when we don't split 
functions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19878#discussion_r154805001
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -270,17 +270,36 @@ abstract class HashExpression[E] extends Expression {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 ev.isNull = "false"
-val childrenHash = ctx.splitExpressions(children.map { child =>
+
+val childrenHash = children.map { child =>
   val childGen = child.genCode(ctx)
   childGen.code + ctx.nullSafeExec(child.nullable, childGen.isNull) {
 computeHash(childGen.value, child.dataType, ev.value, ctx)
   }
-})
+}
+
+val hashResultType = ctx.javaType(dataType)
+val codes = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
--- End diff --

That one has been merged, but this one is still different. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19878#discussion_r154691937
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -389,13 +408,21 @@ abstract class HashExpression[E] extends Expression {
   input: String,
   result: String,
   fields: Array[StructField]): String = {
-val hashes = fields.zipWithIndex.map { case (field, index) =>
+val fieldsHash = fields.zipWithIndex.map { case (field, index) =>
   nullSafeElementHash(input, index.toString, field.nullable, 
field.dataType, result, ctx)
 }
+val hashResultType = ctx.javaType(dataType)
--- End diff --

`ctx` is only available inside `doGenCode`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19878#discussion_r154687417
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -389,13 +408,21 @@ abstract class HashExpression[E] extends Expression {
   input: String,
   result: String,
   fields: Array[StructField]): String = {
-val hashes = fields.zipWithIndex.map { case (field, index) =>
+val fieldsHash = fields.zipWithIndex.map { case (field, index) =>
   nullSafeElementHash(input, index.toString, field.nullable, 
field.dataType, result, ctx)
 }
+val hashResultType = ctx.javaType(dataType)
--- End diff --

nit: this is done also in line 281. Can we do this only once? maybe with a 
`lazy val`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19878#discussion_r154683889
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -730,23 +776,29 @@ case class HiveHash(children: Seq[Expression]) 
extends HashExpression[Int] {
   input: String,
   result: String,
   fields: Array[StructField]): String = {
-val localResult = ctx.freshName("localResult")
 val childResult = ctx.freshName("childResult")
-fields.zipWithIndex.map { case (field, index) =>
+val fieldsHash = fields.zipWithIndex.map { case (field, index) =>
+  val computeFieldHash = nullSafeElementHash(
+input, index.toString, field.nullable, field.dataType, 
childResult, ctx)
   s"""
- $childResult = 0;
- ${nullSafeElementHash(input, index.toString, field.nullable, 
field.dataType,
-   childResult, ctx)}
- $localResult = (31 * $localResult) + $childResult;
-   """
-}.mkString(
--- End diff --

We forgot to split the code for computing hive hash of struct, it's fixed 
now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19878#discussion_r154683716
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -270,17 +270,36 @@ abstract class HashExpression[E] extends Expression {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 ev.isNull = "false"
-val childrenHash = ctx.splitExpressions(children.map { child =>
+
+val childrenHash = children.map { child =>
   val childGen = child.genCode(ctx)
   childGen.code + ctx.nullSafeExec(child.nullable, childGen.isNull) {
 computeHash(childGen.value, child.dataType, ev.value, ctx)
   }
-})
+}
+
+val hashResultType = ctx.javaType(dataType)
+val codes = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
--- End diff --

I think @kiszk is doing this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19878#discussion_r154682872
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -270,17 +270,36 @@ abstract class HashExpression[E] extends Expression {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 ev.isNull = "false"
-val childrenHash = ctx.splitExpressions(children.map { child =>
+
+val childrenHash = children.map { child =>
   val childGen = child.genCode(ctx)
   childGen.code + ctx.nullSafeExec(child.nullable, childGen.isNull) {
 computeHash(childGen.value, child.dataType, ev.value, ctx)
   }
-})
+}
+
+val hashResultType = ctx.javaType(dataType)
+val codes = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
--- End diff --

This pattern appears many times in the code base, we may need to create a 
`ctx.splitExpressionsWithCurrentInput` for it later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19878: [SPARK-22682][SQL] HashExpression does not need t...

2017-12-04 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/19878

[SPARK-22682][SQL] HashExpression does not need to create global variables

## What changes were proposed in this pull request?

It turns out that `HashExpression` can pass around some values via 
parameter when splitting codes into methods, to save some global variable slots.

This can also prevent a weird case that global variable appears in 
parameter list, which is discovered by 
https://github.com/apache/spark/pull/19865

## How was this patch tested?

existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19878.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19878


commit 0e9998e0704b54d8f1352a1936c9b6367ebee15e
Author: Wenchen Fan 
Date:   2017-12-04T15:24:46Z

HashExpression does not need to create global variables




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org