[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19728 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r151741098 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -125,19 +138,43 @@ case class ConcatWs(children: Seq[Expression]) if (children.forall(_.dataType == StringType)) { // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val separator = evals.head + val strings = evals.tail + val argNums = strings.length + val args = ctx.freshName("args") + + val inputs = strings.zipWithIndex.map { case (eval, index) => +if (eval.isNull != "true") { + s""" + ${eval.code} + if (!${eval.isNull}) { + $args[$index] = ${eval.value}; + } + """ +} else { + "" +} + } + val codes = s"${separator.code}\n" + +(if (ctx.INPUT_ROW != null && ctx.currentVars == null) { + ctx.splitExpressions(inputs, "valueConcatWs", + ("InternalRow", ctx.INPUT_ROW) :: ("UTF8String[]", args) :: Nil) + } else { + inputs.mkString("\n") + }) + ev.copy(s""" +UTF8String[] $args = new UTF8String[$argNums]; +$codes +UTF8String ${ev.value} = UTF8String.concatWs(${separator.value}, $args); --- End diff -- nit: ``` val code = if (ctx.INPUT_ROW != null && ctx.currentVars == null) ... ev.copy(code = s""" UTF8String[] $args = new UTF8String[$argNums]; ${separator.code} ... """) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r151740628 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -125,19 +138,43 @@ case class ConcatWs(children: Seq[Expression]) if (children.forall(_.dataType == StringType)) { // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val separator = evals.head + val strings = evals.tail + val argNums = strings.length --- End diff -- nit: `numArgs` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r151738992 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; - } +val numArgs = evals.length --- End diff -- nit: we can inline it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r151482932 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,32 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("args") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { --- End diff -- I see. I will remove this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r151478604 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,32 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("args") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { --- End diff -- please do not mix in optimizations with bug fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r151478304 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,32 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length --- End diff -- nit: `numArgs`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150590456 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression]) // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val argNums = evals.length --- End diff -- Thank you for your kindly explanation. I totally agree with you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150579715 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression]) // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val argNums = evals.length --- End diff -- no, I am saying the opposite. Let's consider an example, maybe I better can explain what I mean in this way. Let's assume that we are running `concat_ws(',', 'a', 'b')`. Then, `evals` would contain 3 elements. So here your `argNums` would be 3. But [here](https://github.com/apache/spark/pull/19728/files/e6b73abf33216fe564ee59213877403cf137c3b4#diff-2eb730f0723800778689468764627032R149) you would be using only `$args[0]` and `$args[1]`, because the first element (`','`, the separator) is handled differently. Thus, I would suggest to have something like: ``` val sep = evals.head val strings = evals.tail val argNums = strings.length // note that this is evals.length -1 ... ``` I think that in this way the code would be much clearer (other than fixing this little bug). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150577735 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression]) // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val argNums = evals.length + val args = ctx.freshName("args") + + val inputs = evals.tail.zipWithIndex.map { case (eval, index) => +if (eval.isNull != "true") { + s""" + ${eval.code} --- End diff -- good catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150573152 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression]) // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val argNums = evals.length --- End diff -- I am sorry that I cannot understand your suggestion. `argNums` is referred as `$argNums` to allocate an array. Are you suggesting to allocate an array as `new UTF8String[$argNums + 1]`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150567337 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression]) // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val argNums = evals.length --- End diff -- this should be `evals.length -1`, or even better, I'd suggest to declare two variables: `sep` and `strings` (or the name you prefer) to hold `head` and `tail`. This would help readability too IMHO. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150564483 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -126,18 +143,41 @@ case class ConcatWs(children: Seq[Expression]) // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val argNums = evals.length + val args = ctx.freshName("args") + + val inputs = evals.tail.zipWithIndex.map { case (eval, index) => +if (eval.isNull != "true") { + s""" + ${eval.code} --- End diff -- nit: in this and the next 4 lines an indentation space is missing if I am not wrong. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150516641 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -126,18 +139,36 @@ case class ConcatWs(children: Seq[Expression]) // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val argNums = evals.length + val args = ctx.freshName("argLen") + ctx.addMutableState("UTF8String[]", args, "") --- End diff -- this also can be kept method local, IMHO --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150516546 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("args") +ctx.addMutableState("UTF8String[]", args, "") --- End diff -- I think we can avoid defining this as a global variable, what do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150470069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") +ctx.addMutableState("UTF8String[]", args, "") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { +s""" + ${eval.code} + if (!${eval.isNull}) { +$args[$index] = ${eval.value}; + } --- End diff -- Thank you for taking your time for my code. It is not easy to imagine the generated code. I may have to put the generated code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150469779 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") +ctx.addMutableState("UTF8String[]", args, "") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { +s""" + ${eval.code} + if (!${eval.isNull}) { +$args[$index] = ${eval.value}; + } --- End diff -- Oh, I see. I didn't notice that `args` is not globally initialized. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150469019 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") +ctx.addMutableState("UTF8String[]", args, "") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { +s""" + ${eval.code} + if (!${eval.isNull}) { +$args[$index] = ${eval.value}; + } --- End diff -- When the next row is processed, `$argsnew UTF8String[$argNums]` is executed again in `apply()` method. In other words, current implementation does not reuse `UTF8String[]` between different rows. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150468020 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") +ctx.addMutableState("UTF8String[]", args, "") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { +s""" + ${eval.code} + if (!${eval.isNull}) { +$args[$index] = ${eval.value}; + } --- End diff -- That is guaranteed before the first row processed. After we process rows, `args` are updated for each row. E.g., `args[0]` can be updated and assigned with a string for row 0. In next row, if `eval.isNull` is evaluated to true, we don't override back to null, so `arg[0]` is still the string value for row 0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150467377 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") +ctx.addMutableState("UTF8String[]", args, "") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { +s""" + ${eval.code} + if (!${eval.isNull}) { +$args[$index] = ${eval.value}; + } --- End diff -- Thanks, while `args` is global, `UTF8String[]` is allocated before executing these assignments. Thus, we can ensure all of elements in `args` are `null`. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150461348 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") +ctx.addMutableState("UTF8String[]", args, "") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { +s""" + ${eval.code} + if (!${eval.isNull}) { +$args[$index] = ${eval.value}; + } --- End diff -- I think we are better to assign null too if `eval.isNull == null`, because `args` is global variable and we need to override all values for previous row. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150460595 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,28 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") --- End diff -- argLen? I think it's not length of args? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150454183 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -163,13 +190,18 @@ case class ConcatWs(children: Seq[Expression]) } }.unzip - ev.copy(evals.map(_.code).mkString("\n") + + // ev.copy(evals.map(_.code).mkString("\n") + --- End diff -- thanks :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150454166 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,26 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") +ctx.addMutableState("UTF8String[]", args, "") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { +s""" + ${eval.code} + $args[$index] = ${eval.value}; --- End diff -- good catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150444966 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -126,18 +137,34 @@ case class ConcatWs(children: Seq[Expression]) // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") + val argNums = evals.length + val args = ctx.freshName("argLen") + ctx.addMutableState("UTF8String[]", args, "") - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val inputs = evals.tail.zipWithIndex.map { case (eval, index) => +if (eval.isNull != "true") { + s""" + ${eval.code} + $args[$index] = ${eval.value}; --- End diff -- Same as https://github.com/apache/spark/pull/19728/files#r150444902. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150444902 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,26 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") +ctx.addMutableState("UTF8String[]", args, "") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { +s""" + ${eval.code} + $args[$index] = ${eval.value}; --- End diff -- If `eval.isNull` is evaluated to `null` at runtime, `eval.value` is useless. We should assign `null` in that case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150444269 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,26 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") +ctx.addMutableState("UTF8String[]", args, "") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { --- End diff -- If `eval.isNull` is not a pre-evaluated constant, I expect that the following code at lines 73-74 will be generated. Only when we ensure it is `true`, we can avoid assigning a value (use default `null` value). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150441354 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -163,13 +190,18 @@ case class ConcatWs(children: Seq[Expression]) } }.unzip - ev.copy(evals.map(_.code).mkString("\n") + + // ev.copy(evals.map(_.code).mkString("\n") + --- End diff -- Forget to remove commented code? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150441287 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -63,15 +63,26 @@ case class Concat(children: Seq[Expression]) extends Expression with ImplicitCas override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evals = children.map(_.genCode(ctx)) -val inputs = evals.map { eval => - s"${eval.isNull} ? null : ${eval.value}" -}.mkString(", ") -ev.copy(evals.map(_.code).mkString("\n") + s""" - boolean ${ev.isNull} = false; - UTF8String ${ev.value} = UTF8String.concat($inputs); - if (${ev.value} == null) { -${ev.isNull} = true; +val argNums = evals.length +val args = ctx.freshName("argLen") +ctx.addMutableState("UTF8String[]", args, "") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + if (eval.isNull != "true") { --- End diff -- If `eval.isNull` is not a pre-evaluated constant? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19728#discussion_r150441323 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -126,18 +137,34 @@ case class ConcatWs(children: Seq[Expression]) // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") + val argNums = evals.length + val args = ctx.freshName("argLen") + ctx.addMutableState("UTF8String[]", args, "") - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val inputs = evals.tail.zipWithIndex.map { case (eval, index) => +if (eval.isNull != "true") { --- End diff -- Same issue as https://github.com/apache/spark/pull/19728/files#r150441287. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/19728 [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem with concat and concat_ws ## What changes were proposed in this pull request? This PR changes `concat` and `concat_ws` code generation to place generated code for expression for arguments into separated methods if these size could be large. This PR resolved two cases: * `concat` with a lot of argument * `concat_ws` with a lot of argument ## How was this patch tested? Added new test cases into `StringExpressionsSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-22498 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19728.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19728 commit 16ddb2f0a2bd85917bfc8bbb5dbcc7d92d457977 Author: Kazuaki IshizakiDate: 2017-11-12T16:58:04Z Initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org