[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19937


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r156010028
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
 ctx.INPUT_ROW = leftRow
 left.output.zipWithIndex.map { case (a, i) =>
   val value = ctx.freshName("value")
   val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
-  // declare it as class member, so we can access the column before or 
in the loop.
-  ctx.addMutableState(ctx.javaType(a.dataType), value)
   if (a.nullable) {
 val isNull = ctx.freshName("isNull")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
 val code =
   s"""
  |$isNull = $leftRow.isNullAt($i);
  |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : 
($valueCode);
""".stripMargin
-ExprCode(code, isNull, value)
+(ExprCode(code, isNull, value),
--- End diff --

Yea, I see, I just ran the test and found it too. Looks like puts the 
declaration at top is simplest.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r156007853
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
 ctx.INPUT_ROW = leftRow
 left.output.zipWithIndex.map { case (a, i) =>
   val value = ctx.freshName("value")
   val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
-  // declare it as class member, so we can access the column before or 
in the loop.
-  ctx.addMutableState(ctx.javaType(a.dataType), value)
   if (a.nullable) {
 val isNull = ctx.freshName("isNull")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
 val code =
   s"""
  |$isNull = $leftRow.isNullAt($i);
  |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : 
($valueCode);
""".stripMargin
-ExprCode(code, isNull, value)
+(ExprCode(code, isNull, value),
--- End diff --

Good question. If we declare variable types only at `code`, it may lead to 
compilation error in`janino`.  

The `code` that you pointed out may go to `$leftAfter` in `$condCheck` that 
exists in a `if-then` block at the inner most of the loop in the generated 
code, by using `leftVar`. The variable in `leftVars` is also referred to at 
`${consume(ctx, leftVars ++ rightVars)}`.

In the following example, If we declare variable types only at `code`, we 
will drop lines 145 and will declare `int` for `smj_value8` at lines 162. Since 
`smj_value8` is refered at line 169, the generated code would cause compilation 
error.

WDYT?

```
/* 143 */   protected void processNext() throws java.io.IOException {
/* 144 */ while (findNextInnerJoinRows(smj_leftInput, smj_rightInput)) {
/* 145 */   int smj_value8 = -1;
/* 146 */   boolean smj_isNull6 = false;
/* 147 */   int smj_value9 = -1;
/* 148 */   boolean smj_loaded = false;
/* 149 */   smj_isNull6 = smj_leftRow.isNullAt(1);
/* 150 */   smj_value9 = smj_isNull6 ? -1 : (smj_leftRow.getInt(1));
/* 151 */   scala.collection.Iterator smj_iterator = 
smj_matches.generateIterator();
/* 152 */   while (smj_iterator.hasNext()) {
...
/* 160 */ if (!smj_loaded) {
/* 161 */   smj_loaded = true;
/* 162 */   smj_value8 = smj_leftRow.getInt(0);
/* 163 */ }
/* 164 */ int smj_value10 = smj_rightRow1.getInt(0);
/* 165 */ smj_numOutputRows.add(1);
/* 166 */
/* 167 */ smj_rowWriter.zeroOutNullBytes();
/* 168 */
/* 169 */ smj_rowWriter.write(0, smj_value8);
...
/* 185 */
/* 186 */   }
/* 187 */   if (shouldStop()) return;
/* 188 */ }
/* 189 */   }
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r156005276
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
 ctx.INPUT_ROW = leftRow
 left.output.zipWithIndex.map { case (a, i) =>
   val value = ctx.freshName("value")
   val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
-  // declare it as class member, so we can access the column before or 
in the loop.
-  ctx.addMutableState(ctx.javaType(a.dataType), value)
   if (a.nullable) {
 val isNull = ctx.freshName("isNull")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
 val code =
   s"""
  |$isNull = $leftRow.isNullAt($i);
  |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : 
($valueCode);
""".stripMargin
-ExprCode(code, isNull, value)
+(ExprCode(code, isNull, value),
--- End diff --

good catch! Makes sense to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r156005086
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

@kiszk thanks for the test. Then I agree this is the best option, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r156004320
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
 ctx.INPUT_ROW = leftRow
 left.output.zipWithIndex.map { case (a, i) =>
   val value = ctx.freshName("value")
   val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
-  // declare it as class member, so we can access the column before or 
in the loop.
-  ctx.addMutableState(ctx.javaType(a.dataType), value)
   if (a.nullable) {
 val isNull = ctx.freshName("isNull")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
 val code =
   s"""
  |$isNull = $leftRow.isNullAt($i);
  |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : 
($valueCode);
""".stripMargin
-ExprCode(code, isNull, value)
+(ExprCode(code, isNull, value),
--- End diff --

Why separate variable declaration and code to two parts? Previously it is 
because it uses global variables. Now since we use local variables, I think we 
can simply do:

```scala
 val code =
   s"""
  |boolean $isNull = $leftRow.isNullAt($i);
  |${ctx.javaType(a.dataType)} $value = $isNull ? 
${ctx.defaultValue(a.dataType)} : ($valueCode);
""".stripMargin
```

Don't we?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r156000717
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

Since these local variables are only needed inside the loop, I feel the 
current code is more readable. Performance is not a concern here as the 
overhead is very low or none.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r15631
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
 ctx.INPUT_ROW = leftRow
 left.output.zipWithIndex.map { case (a, i) =>
   val value = ctx.freshName("value")
   val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
-  // declare it as class member, so we can access the column before or 
in the loop.
-  ctx.addMutableState(ctx.javaType(a.dataType), value)
   if (a.nullable) {
 val isNull = ctx.freshName("isNull")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
 val code =
   s"""
  |$isNull = $leftRow.isNullAt($i);
  |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : 
($valueCode);
""".stripMargin
-ExprCode(code, isNull, value)
+(ExprCode(code, isNull, value),
+  s"""
+ |boolean $isNull = false;
+ |${ctx.javaType(a.dataType)} $value = 
${ctx.defaultValue(a.dataType)};
+   """.stripMargin)
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155984429
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
 ctx.INPUT_ROW = leftRow
 left.output.zipWithIndex.map { case (a, i) =>
   val value = ctx.freshName("value")
   val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
-  // declare it as class member, so we can access the column before or 
in the loop.
-  ctx.addMutableState(ctx.javaType(a.dataType), value)
   if (a.nullable) {
 val isNull = ctx.freshName("isNull")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
 val code =
   s"""
  |$isNull = $leftRow.isNullAt($i);
  |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : 
($valueCode);
""".stripMargin
-ExprCode(code, isNull, value)
+(ExprCode(code, isNull, value),
+  s"""
+ |boolean $isNull = false;
+ |${ctx.javaType(a.dataType)} $value = 
${ctx.defaultValue(a.dataType)};
+   """.stripMargin)
--- End diff --

Could you assign it first
```Scala
val leftVarsDecl = 
...
(ExprCode(code, isNull, value), leftVarsDecl)

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155984439
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
 ctx.INPUT_ROW = leftRow
 left.output.zipWithIndex.map { case (a, i) =>
   val value = ctx.freshName("value")
   val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
-  // declare it as class member, so we can access the column before or 
in the loop.
-  ctx.addMutableState(ctx.javaType(a.dataType), value)
   if (a.nullable) {
 val isNull = ctx.freshName("isNull")
-ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
 val code =
   s"""
  |$isNull = $leftRow.isNullAt($i);
  |$value = $isNull ? ${ctx.defaultValue(a.dataType)} : 
($valueCode);
""".stripMargin
-ExprCode(code, isNull, value)
+(ExprCode(code, isNull, value),
+  s"""
+ |boolean $isNull = false;
+ |${ctx.javaType(a.dataType)} $value = 
${ctx.defaultValue(a.dataType)};
+   """.stripMargin)
   } else {
-ExprCode(s"$value = $valueCode;", "false", value)
+(ExprCode(s"$value = $valueCode;", "false", value),
+  s"""${ctx.javaType(a.dataType)} $value = 
${ctx.defaultValue(a.dataType)};""")
--- End diff --

The same here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155984232
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

If no performance difference, we prefer to the simpler codes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155984096
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the 
columns, we can't use the
* codegen of BoundReference here.
*/
-  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
Seq[ExprCode] = {
+  private def createLeftVars(ctx: CodegenContext, leftRow: String): 
(Seq[ExprCode], Seq[String]) = {
--- End diff --

We need to update the function description. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155949301
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

Here is my benchmark result. Is there any your result?

```
$ cat Loop.java 
public class Loop {
  private static int _i;
  private static boolean _b;
 
  public static void main(String[] args){
Loop l = new Loop();
long s, e;
long c1 = 100L;
long c2 = 10L;
int N = 10;

// warmup
for (int i = 0; i < 100; i++) {
  l.inner(1, 100);
  l.outer(1, 100);
}

for (int n = 0; n < N; n++) {
  s = System.nanoTime();
  l.inner(c1, c2);
  e = System.nanoTime();
  System.out.println("inner(ms): " + (e-s) / 100);
}

for (int n = 0; n < N; n++) {
  s = System.nanoTime();
  l.outer(c1, c2);
  e = System.nanoTime();   
  System.out.println("outer(ms): " + (e-s) / 100);
}
  }
 
  public void inner(long c1, long c2) {
for (long i = 0; i < c1; i++) {
  int v1 = 1;
  boolean v2 = true;
  for (long j = 0; i < c2; i++) {
_i = v1;
_b = v2;
  }
}
  }
 
  public void outer(long c1, long c2) {
int v1 = 1;
boolean v2 = true;
for (long i = 0; i < c1; i++) {
  for (long j = 0; i < c2; i++) {
_i = v1;
_b = v2;
  }
}
  }
}
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
$ java Loop
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
inner(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
outer(ms): 2779
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155944247
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

We would appreciate it if you would past the kernel code what you are 
thinking about.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155944205
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

I think that it is not easy for me to cast to a standalone benchmark since 
generated code depends on the Spark internal structures.  
I would appreciate it if you would prepare an easy benchmark for you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155942157
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

I was thinking about an easy benchmark like what we did for casting. WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155934663
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

Do you think what microbench is relevant for this measurement? I would 
appreciate your thought.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155927167
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

I am not sure about the amount of overhead introduced honestly. But I think 
that there might be, even if low. Therefore my suggestion was to avoid the 
overhead, since IMHO it is feasible and it reflects the previous situation. 
Honestly I don't think that the readability of the generated code is a big 
point, because I think that the generated code is already nearly impossible to 
be read as of now (without reading and knowing the code which generates it).

But maybe we can create a test and evaluate the overhead if you think this 
is the best option. WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155927018
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

Since they are local variable, it takes almost no cost. If they are on CPU 
registers, there is no cost. If they are in stack frame, it is up to one 
instruction to increase or decrease stack frame size.

WDYT? Did you see huge overhead to create and destroy local variables?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155926973
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

I think that the advantage would be to reuse them, avoiding creating and 
destroying them at every loop.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155926812
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

Since I have no strong preference, I would like to hear opinions from 
others.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155926316
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

It could be. Would it be possible to let us know advantages compare to the 
current method?  
I think that to shorten lifetime of variables (i.e. current approach) make 
generated code more readable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19937#discussion_r155925941
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -617,6 +619,7 @@ case class SortMergeJoinExec(
 
 s"""
|while (findNextInnerJoinRows($leftInput, $rightInput)) {
+   |  ${leftVarDecl.mkString("\n")}
--- End diff --

can't we define them before the loop and reuse them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-09 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/19937

[SPARK-22746][SQL] Avoid the generation of useless mutable states by 
SortMergeJoin

## What changes were proposed in this pull request?

This PR reduce the number of global mutable variables in generated code of 
`SortMergeJoin`.

## How was this patch tested?

Existing test cases

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-22746

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19937.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19937


commit 6d489248f4b2c2e5b40f51524e21992a59226ae8
Author: Kazuaki Ishizaki 
Date:   2017-12-09T15:27:24Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org