[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19728


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r151741098
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -125,19 +138,43 @@ case class ConcatWs(children: Seq[Expression])
 if (children.forall(_.dataType == StringType)) {
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
-
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val separator = evals.head
+  val strings = evals.tail
+  val argNums = strings.length
+  val args = ctx.freshName("args")
+
+  val inputs = strings.zipWithIndex.map { case (eval, index) =>
+if (eval.isNull != "true") {
+  s"""
+ ${eval.code}
+ if (!${eval.isNull}) {
+   $args[$index] = ${eval.value};
+ }
+   """
+} else {
+  ""
+}
+  }
+  val codes = s"${separator.code}\n" +
+(if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+   ctx.splitExpressions(inputs, "valueConcatWs",
+ ("InternalRow", ctx.INPUT_ROW) :: ("UTF8String[]", args) :: 
Nil)
+ } else {
+   inputs.mkString("\n")
+ })
+  ev.copy(s"""
+UTF8String[] $args = new UTF8String[$argNums];
+$codes
+UTF8String ${ev.value} = UTF8String.concatWs(${separator.value}, 
$args);
--- End diff --

nit:
```
val code = if (ctx.INPUT_ROW != null && ctx.currentVars == null)
...
ev.copy(code = s"""
  UTF8String[] $args = new UTF8String[$argNums];
  ${separator.code}
  ...
""")
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r151740628
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -125,19 +138,43 @@ case class ConcatWs(children: Seq[Expression])
 if (children.forall(_.dataType == StringType)) {
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
-
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val separator = evals.head
+  val strings = evals.tail
+  val argNums = strings.length
--- End diff --

nit: `numArgs`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r151738992
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
-  }
+val numArgs = evals.length
--- End diff --

nit: we can inline it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-16 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r151482932
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,32 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("args")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
--- End diff --

I see. I will remove this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r151478604
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,32 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("args")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
--- End diff --

please do not mix in optimizations with bug fix


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r151478304
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,32 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
--- End diff --

nit: `numArgs`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150590456
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression])
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
 
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val argNums = evals.length
--- End diff --

Thank you for your kindly explanation. I totally agree with you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150579715
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression])
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
 
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val argNums = evals.length
--- End diff --

no, I am saying the opposite. Let's consider an example, maybe I better can 
explain what I mean in this way. Let's assume that we are running 
`concat_ws(',', 'a', 'b')`. Then, `evals` would contain 3 elements. So here 
your `argNums` would be 3. But 
[here](https://github.com/apache/spark/pull/19728/files/e6b73abf33216fe564ee59213877403cf137c3b4#diff-2eb730f0723800778689468764627032R149)
 you would be using only `$args[0]` and `$args[1]`, because the first element 
(`','`, the separator) is handled differently.
Thus, I would suggest to have something like:
```
val sep = evals.head
val strings = evals.tail
val argNums = strings.length // note that this is evals.length -1
...
```
I think that in this way the code would be much clearer (other than fixing 
this little bug).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150577735
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression])
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
 
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val argNums = evals.length
+  val args = ctx.freshName("args")
+
+  val inputs = evals.tail.zipWithIndex.map { case (eval, index) =>
+if (eval.isNull != "true") {
+  s"""
+   ${eval.code}
--- End diff --

good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150573152
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression])
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
 
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val argNums = evals.length
--- End diff --

I am sorry that I cannot understand your suggestion.
`argNums` is referred as `$argNums` to allocate an array.  Are you 
suggesting to allocate an array as `new UTF8String[$argNums + 1]`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150567337
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression])
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
 
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val argNums = evals.length
--- End diff --

this should be `evals.length -1`, or even better, I'd suggest to declare 
two variables: `sep` and `strings` (or the name you prefer) to hold `head` and 
`tail`. This would help readability too IMHO.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150564483
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression])
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
 
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val argNums = evals.length
+  val args = ctx.freshName("args")
+
+  val inputs = evals.tail.zipWithIndex.map { case (eval, index) =>
+if (eval.isNull != "true") {
+  s"""
+   ${eval.code}
--- End diff --

nit: in this and the next 4 lines an indentation space is missing if I am 
not wrong.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150516641
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -126,18 +139,36 @@ case class ConcatWs(children: Seq[Expression])
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
 
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val argNums = evals.length
+  val args = ctx.freshName("argLen")
+  ctx.addMutableState("UTF8String[]", args, "")
--- End diff --

this also can be kept method local, IMHO


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150516546
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("args")
+ctx.addMutableState("UTF8String[]", args, "")
--- End diff --

I think we can avoid defining this as a global variable, what do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150470069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
+ctx.addMutableState("UTF8String[]", args, "")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
+s"""
+  ${eval.code}
+  if (!${eval.isNull}) {
+$args[$index] = ${eval.value};
+  }
--- End diff --

Thank you for taking your time for my code.  
It is not easy to imagine the generated code. I may have to put the 
generated code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150469779
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
+ctx.addMutableState("UTF8String[]", args, "")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
+s"""
+  ${eval.code}
+  if (!${eval.isNull}) {
+$args[$index] = ${eval.value};
+  }
--- End diff --

Oh, I see. I didn't notice that `args` is not globally initialized.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150469019
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
+ctx.addMutableState("UTF8String[]", args, "")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
+s"""
+  ${eval.code}
+  if (!${eval.isNull}) {
+$args[$index] = ${eval.value};
+  }
--- End diff --

When the next row is processed, `$argsnew UTF8String[$argNums]` is executed 
again in `apply()` method. In other words, current implementation does not 
reuse `UTF8String[]` between different rows.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150468020
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
+ctx.addMutableState("UTF8String[]", args, "")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
+s"""
+  ${eval.code}
+  if (!${eval.isNull}) {
+$args[$index] = ${eval.value};
+  }
--- End diff --

That is guaranteed before the first row processed. After we process rows, 
`args` are updated for each row. E.g., `args[0]` can be updated and assigned 
with a string for row 0. In next row, if `eval.isNull` is evaluated to true, we 
don't override back to null, so `arg[0]` is still the string value for row 0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-13 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150467377
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
+ctx.addMutableState("UTF8String[]", args, "")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
+s"""
+  ${eval.code}
+  if (!${eval.isNull}) {
+$args[$index] = ${eval.value};
+  }
--- End diff --

Thanks, while `args` is global, `UTF8String[]` is allocated before 
executing these assignments. Thus, we can ensure all of elements in `args` are 
`null`.
What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150461348
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
+ctx.addMutableState("UTF8String[]", args, "")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
+s"""
+  ${eval.code}
+  if (!${eval.isNull}) {
+$args[$index] = ${eval.value};
+  }
--- End diff --

I think we are better to assign null too if `eval.isNull == null`, because 
`args` is global variable and we need to override all values for previous row.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150460595
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
--- End diff --

argLen? I think it's not length of args?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150454183
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -163,13 +190,18 @@ case class ConcatWs(children: Seq[Expression])
 }
   }.unzip
 
-  ev.copy(evals.map(_.code).mkString("\n") +
+  // ev.copy(evals.map(_.code).mkString("\n") +
--- End diff --

thanks :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150454166
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,26 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
+ctx.addMutableState("UTF8String[]", args, "")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
+s"""
+  ${eval.code}
+  $args[$index] = ${eval.value};
--- End diff --

good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150444966
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -126,18 +137,34 @@ case class ConcatWs(children: Seq[Expression])
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
 
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
+  val argNums = evals.length
+  val args = ctx.freshName("argLen")
+  ctx.addMutableState("UTF8String[]", args, "")
 
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val inputs = evals.tail.zipWithIndex.map { case (eval, index) =>
+if (eval.isNull != "true") {
+  s"""
+   ${eval.code}
+   $args[$index] = ${eval.value};
--- End diff --

Same as https://github.com/apache/spark/pull/19728/files#r150444902.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150444902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,26 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
+ctx.addMutableState("UTF8String[]", args, "")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
+s"""
+  ${eval.code}
+  $args[$index] = ${eval.value};
--- End diff --

If `eval.isNull` is evaluated to `null` at runtime, `eval.value` is 
useless. We should assign `null` in that case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150444269
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,26 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
+ctx.addMutableState("UTF8String[]", args, "")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
--- End diff --

If `eval.isNull` is not a pre-evaluated constant, I expect that the 
following code at lines 73-74 will be generated. Only when we ensure it is 
`true`, we can avoid assigning a value (use default `null` value).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150441354
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -163,13 +190,18 @@ case class ConcatWs(children: Seq[Expression])
 }
   }.unzip
 
-  ev.copy(evals.map(_.code).mkString("\n") +
+  // ev.copy(evals.map(_.code).mkString("\n") +
--- End diff --

Forget to remove commented code?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150441287
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -63,15 +63,26 @@ case class Concat(children: Seq[Expression]) extends 
Expression with ImplicitCas
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val evals = children.map(_.genCode(ctx))
-val inputs = evals.map { eval =>
-  s"${eval.isNull} ? null : ${eval.value}"
-}.mkString(", ")
-ev.copy(evals.map(_.code).mkString("\n") + s"""
-  boolean ${ev.isNull} = false;
-  UTF8String ${ev.value} = UTF8String.concat($inputs);
-  if (${ev.value} == null) {
-${ev.isNull} = true;
+val argNums = evals.length
+val args = ctx.freshName("argLen")
+ctx.addMutableState("UTF8String[]", args, "")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  if (eval.isNull != "true") {
--- End diff --

If `eval.isNull` is not a pre-evaluated constant?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19728#discussion_r150441323
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -126,18 +137,34 @@ case class ConcatWs(children: Seq[Expression])
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
 
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
+  val argNums = evals.length
+  val args = ctx.freshName("argLen")
+  ctx.addMutableState("UTF8String[]", args, "")
 
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val inputs = evals.tail.zipWithIndex.map { case (eval, index) =>
+if (eval.isNull != "true") {
--- End diff --

Same issue as https://github.com/apache/spark/pull/19728/files#r150441287.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-12 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/19728

[SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem with concat and 
concat_ws

## What changes were proposed in this pull request?

This PR changes `concat` and `concat_ws` code generation to place generated 
code for expression for arguments into separated methods if these size could be 
large.
This PR resolved two cases:

* `concat` with a lot of argument
* `concat_ws` with a lot of argument

## How was this patch tested?

Added new test cases into `StringExpressionsSuite`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-22498

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19728.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19728


commit 16ddb2f0a2bd85917bfc8bbb5dbcc7d92d457977
Author: Kazuaki Ishizaki 
Date:   2017-11-12T16:58:04Z

Initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org