[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19937 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156010028 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec( * the variables should be declared separately from accessing the columns, we can't use the * codegen of BoundReference here. */ - private def createLeftVars(ctx: CodegenContext, leftRow: String): Seq[ExprCode] = { + private def createLeftVars(ctx: CodegenContext, leftRow: String): (Seq[ExprCode], Seq[String]) = { ctx.INPUT_ROW = leftRow left.output.zipWithIndex.map { case (a, i) => val value = ctx.freshName("value") val valueCode = ctx.getValue(leftRow, a.dataType, i.toString) - // declare it as class member, so we can access the column before or in the loop. - ctx.addMutableState(ctx.javaType(a.dataType), value) if (a.nullable) { val isNull = ctx.freshName("isNull") -ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull) val code = s""" |$isNull = $leftRow.isNullAt($i); |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : ($valueCode); """.stripMargin -ExprCode(code, isNull, value) +(ExprCode(code, isNull, value), --- End diff -- Yea, I see, I just ran the test and found it too. Looks like puts the declaration at top is simplest. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156007853 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec( * the variables should be declared separately from accessing the columns, we can't use the * codegen of BoundReference here. */ - private def createLeftVars(ctx: CodegenContext, leftRow: String): Seq[ExprCode] = { + private def createLeftVars(ctx: CodegenContext, leftRow: String): (Seq[ExprCode], Seq[String]) = { ctx.INPUT_ROW = leftRow left.output.zipWithIndex.map { case (a, i) => val value = ctx.freshName("value") val valueCode = ctx.getValue(leftRow, a.dataType, i.toString) - // declare it as class member, so we can access the column before or in the loop. - ctx.addMutableState(ctx.javaType(a.dataType), value) if (a.nullable) { val isNull = ctx.freshName("isNull") -ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull) val code = s""" |$isNull = $leftRow.isNullAt($i); |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : ($valueCode); """.stripMargin -ExprCode(code, isNull, value) +(ExprCode(code, isNull, value), --- End diff -- Good question. If we declare variable types only at `code`, it may lead to compilation error in`janino`. The `code` that you pointed out may go to `$leftAfter` in `$condCheck` that exists in a `if-then` block at the inner most of the loop in the generated code, by using `leftVar`. The variable in `leftVars` is also referred to at `${consume(ctx, leftVars ++ rightVars)}`. In the following example, If we declare variable types only at `code`, we will drop lines 145 and will declare `int` for `smj_value8` at lines 162. Since `smj_value8` is refered at line 169, the generated code would cause compilation error. WDYT? ``` /* 143 */ protected void processNext() throws java.io.IOException { /* 144 */ while (findNextInnerJoinRows(smj_leftInput, smj_rightInput)) { /* 145 */ int smj_value8 = -1; /* 146 */ boolean smj_isNull6 = false; /* 147 */ int smj_value9 = -1; /* 148 */ boolean smj_loaded = false; /* 149 */ smj_isNull6 = smj_leftRow.isNullAt(1); /* 150 */ smj_value9 = smj_isNull6 ? -1 : (smj_leftRow.getInt(1)); /* 151 */ scala.collection.Iterator smj_iterator = smj_matches.generateIterator(); /* 152 */ while (smj_iterator.hasNext()) { ... /* 160 */ if (!smj_loaded) { /* 161 */ smj_loaded = true; /* 162 */ smj_value8 = smj_leftRow.getInt(0); /* 163 */ } /* 164 */ int smj_value10 = smj_rightRow1.getInt(0); /* 165 */ smj_numOutputRows.add(1); /* 166 */ /* 167 */ smj_rowWriter.zeroOutNullBytes(); /* 168 */ /* 169 */ smj_rowWriter.write(0, smj_value8); ... /* 185 */ /* 186 */ } /* 187 */ if (shouldStop()) return; /* 188 */ } /* 189 */ } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156005276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec( * the variables should be declared separately from accessing the columns, we can't use the * codegen of BoundReference here. */ - private def createLeftVars(ctx: CodegenContext, leftRow: String): Seq[ExprCode] = { + private def createLeftVars(ctx: CodegenContext, leftRow: String): (Seq[ExprCode], Seq[String]) = { ctx.INPUT_ROW = leftRow left.output.zipWithIndex.map { case (a, i) => val value = ctx.freshName("value") val valueCode = ctx.getValue(leftRow, a.dataType, i.toString) - // declare it as class member, so we can access the column before or in the loop. - ctx.addMutableState(ctx.javaType(a.dataType), value) if (a.nullable) { val isNull = ctx.freshName("isNull") -ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull) val code = s""" |$isNull = $leftRow.isNullAt($i); |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : ($valueCode); """.stripMargin -ExprCode(code, isNull, value) +(ExprCode(code, isNull, value), --- End diff -- good catch! Makes sense to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156005086 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- @kiszk thanks for the test. Then I agree this is the best option, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156004320 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec( * the variables should be declared separately from accessing the columns, we can't use the * codegen of BoundReference here. */ - private def createLeftVars(ctx: CodegenContext, leftRow: String): Seq[ExprCode] = { + private def createLeftVars(ctx: CodegenContext, leftRow: String): (Seq[ExprCode], Seq[String]) = { ctx.INPUT_ROW = leftRow left.output.zipWithIndex.map { case (a, i) => val value = ctx.freshName("value") val valueCode = ctx.getValue(leftRow, a.dataType, i.toString) - // declare it as class member, so we can access the column before or in the loop. - ctx.addMutableState(ctx.javaType(a.dataType), value) if (a.nullable) { val isNull = ctx.freshName("isNull") -ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull) val code = s""" |$isNull = $leftRow.isNullAt($i); |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : ($valueCode); """.stripMargin -ExprCode(code, isNull, value) +(ExprCode(code, isNull, value), --- End diff -- Why separate variable declaration and code to two parts? Previously it is because it uses global variables. Now since we use local variables, I think we can simply do: ```scala val code = s""" |boolean $isNull = $leftRow.isNullAt($i); |${ctx.javaType(a.dataType)} $value = $isNull ? ${ctx.defaultValue(a.dataType)} : ($valueCode); """.stripMargin ``` Don't we? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156000717 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- Since these local variables are only needed inside the loop, I feel the current code is more readable. Performance is not a concern here as the overhead is very low or none. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r15631 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec( * the variables should be declared separately from accessing the columns, we can't use the * codegen of BoundReference here. */ - private def createLeftVars(ctx: CodegenContext, leftRow: String): Seq[ExprCode] = { + private def createLeftVars(ctx: CodegenContext, leftRow: String): (Seq[ExprCode], Seq[String]) = { ctx.INPUT_ROW = leftRow left.output.zipWithIndex.map { case (a, i) => val value = ctx.freshName("value") val valueCode = ctx.getValue(leftRow, a.dataType, i.toString) - // declare it as class member, so we can access the column before or in the loop. - ctx.addMutableState(ctx.javaType(a.dataType), value) if (a.nullable) { val isNull = ctx.freshName("isNull") -ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull) val code = s""" |$isNull = $leftRow.isNullAt($i); |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : ($valueCode); """.stripMargin -ExprCode(code, isNull, value) +(ExprCode(code, isNull, value), + s""" + |boolean $isNull = false; + |${ctx.javaType(a.dataType)} $value = ${ctx.defaultValue(a.dataType)}; + """.stripMargin) --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155984429 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec( * the variables should be declared separately from accessing the columns, we can't use the * codegen of BoundReference here. */ - private def createLeftVars(ctx: CodegenContext, leftRow: String): Seq[ExprCode] = { + private def createLeftVars(ctx: CodegenContext, leftRow: String): (Seq[ExprCode], Seq[String]) = { ctx.INPUT_ROW = leftRow left.output.zipWithIndex.map { case (a, i) => val value = ctx.freshName("value") val valueCode = ctx.getValue(leftRow, a.dataType, i.toString) - // declare it as class member, so we can access the column before or in the loop. - ctx.addMutableState(ctx.javaType(a.dataType), value) if (a.nullable) { val isNull = ctx.freshName("isNull") -ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull) val code = s""" |$isNull = $leftRow.isNullAt($i); |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : ($valueCode); """.stripMargin -ExprCode(code, isNull, value) +(ExprCode(code, isNull, value), + s""" + |boolean $isNull = false; + |${ctx.javaType(a.dataType)} $value = ${ctx.defaultValue(a.dataType)}; + """.stripMargin) --- End diff -- Could you assign it first ```Scala val leftVarsDecl = ... (ExprCode(code, isNull, value), leftVarsDecl) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155984439 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec( * the variables should be declared separately from accessing the columns, we can't use the * codegen of BoundReference here. */ - private def createLeftVars(ctx: CodegenContext, leftRow: String): Seq[ExprCode] = { + private def createLeftVars(ctx: CodegenContext, leftRow: String): (Seq[ExprCode], Seq[String]) = { ctx.INPUT_ROW = leftRow left.output.zipWithIndex.map { case (a, i) => val value = ctx.freshName("value") val valueCode = ctx.getValue(leftRow, a.dataType, i.toString) - // declare it as class member, so we can access the column before or in the loop. - ctx.addMutableState(ctx.javaType(a.dataType), value) if (a.nullable) { val isNull = ctx.freshName("isNull") -ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull) val code = s""" |$isNull = $leftRow.isNullAt($i); |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : ($valueCode); """.stripMargin -ExprCode(code, isNull, value) +(ExprCode(code, isNull, value), + s""" + |boolean $isNull = false; + |${ctx.javaType(a.dataType)} $value = ${ctx.defaultValue(a.dataType)}; + """.stripMargin) } else { -ExprCode(s"$value = $valueCode;", "false", value) +(ExprCode(s"$value = $valueCode;", "false", value), + s"""${ctx.javaType(a.dataType)} $value = ${ctx.defaultValue(a.dataType)};""") --- End diff -- The same here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155984232 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- If no performance difference, we prefer to the simpler codes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155984096 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec( * the variables should be declared separately from accessing the columns, we can't use the * codegen of BoundReference here. */ - private def createLeftVars(ctx: CodegenContext, leftRow: String): Seq[ExprCode] = { + private def createLeftVars(ctx: CodegenContext, leftRow: String): (Seq[ExprCode], Seq[String]) = { --- End diff -- We need to update the function description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155949301 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- Here is my benchmark result. Is there any your result? ``` $ cat Loop.java public class Loop { private static int _i; private static boolean _b; public static void main(String[] args){ Loop l = new Loop(); long s, e; long c1 = 100L; long c2 = 10L; int N = 10; // warmup for (int i = 0; i < 100; i++) { l.inner(1, 100); l.outer(1, 100); } for (int n = 0; n < N; n++) { s = System.nanoTime(); l.inner(c1, c2); e = System.nanoTime(); System.out.println("inner(ms): " + (e-s) / 100); } for (int n = 0; n < N; n++) { s = System.nanoTime(); l.outer(c1, c2); e = System.nanoTime(); System.out.println("outer(ms): " + (e-s) / 100); } } public void inner(long c1, long c2) { for (long i = 0; i < c1; i++) { int v1 = 1; boolean v2 = true; for (long j = 0; i < c2; i++) { _i = v1; _b = v2; } } } public void outer(long c1, long c2) { int v1 = 1; boolean v2 = true; for (long i = 0; i < c1; i++) { for (long j = 0; i < c2; i++) { _i = v1; _b = v2; } } } } $ java -version openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode) $ java Loop inner(ms): 2779 inner(ms): 2779 inner(ms): 2779 inner(ms): 2779 inner(ms): 2779 inner(ms): 2779 inner(ms): 2779 inner(ms): 2779 inner(ms): 2779 inner(ms): 2779 outer(ms): 2779 outer(ms): 2779 outer(ms): 2779 outer(ms): 2779 outer(ms): 2779 outer(ms): 2779 outer(ms): 2779 outer(ms): 2779 outer(ms): 2779 outer(ms): 2779 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155944247 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- We would appreciate it if you would past the kernel code what you are thinking about. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155944205 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- I think that it is not easy for me to cast to a standalone benchmark since generated code depends on the Spark internal structures. I would appreciate it if you would prepare an easy benchmark for you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155942157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- I was thinking about an easy benchmark like what we did for casting. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155934663 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- Do you think what microbench is relevant for this measurement? I would appreciate your thought. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155927167 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- I am not sure about the amount of overhead introduced honestly. But I think that there might be, even if low. Therefore my suggestion was to avoid the overhead, since IMHO it is feasible and it reflects the previous situation. Honestly I don't think that the readability of the generated code is a big point, because I think that the generated code is already nearly impossible to be read as of now (without reading and knowing the code which generates it). But maybe we can create a test and evaluate the overhead if you think this is the best option. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155927018 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- Since they are local variable, it takes almost no cost. If they are on CPU registers, there is no cost. If they are in stack frame, it is up to one instruction to increase or decrease stack frame size. WDYT? Did you see huge overhead to create and destroy local variables? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155926973 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- I think that the advantage would be to reuse them, avoiding creating and destroying them at every loop. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155926812 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- Since I have no strong preference, I would like to hear opinions from others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155926316 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- It could be. Would it be possible to let us know advantages compare to the current method? I think that to shorten lifetime of variables (i.e. current approach) make generated code more readable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r155925941 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec( s""" |while (findNextInnerJoinRows($leftInput, $rightInput)) { + | ${leftVarDecl.mkString("\n")} --- End diff -- can't we define them before the loop and reuse them? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/19937 [SPARK-22746][SQL] Avoid the generation of useless mutable states by SortMergeJoin ## What changes were proposed in this pull request? This PR reduce the number of global mutable variables in generated code of `SortMergeJoin`. ## How was this patch tested? Existing test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-22746 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19937.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19937 commit 6d489248f4b2c2e5b40f51524e21992a59226ae8 Author: Kazuaki Ishizaki Date: 2017-12-09T15:27:24Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org