[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19777#discussion_r207077435 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -125,13 +125,34 @@ case class ConcatWs(children: Seq[Expression]) if (children.forall(_.dataType == StringType)) { // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val separator = evals.head + val strings = evals.tail + val numArgs = strings.length + val args = ctx.freshName("args") + + val inputs = strings.zipWithIndex.map { case (eval, index) => +if (eval.isNull != "true") { --- End diff -- @kiszk I was looking at build warnings and it notes that this compares a `ExprValue` and `String` and they will always not be equal. Should it be `eval.isNull.code != "true"` maybe? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19777 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19777#discussion_r152045624 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -163,13 +184,36 @@ case class ConcatWs(children: Seq[Expression]) } }.unzip - ev.copy(evals.map(_.code).mkString("\n") + - s""" + val codes = ctx.splitExpressions(ctx.INPUT_ROW, evals.map(_.code)) + val varargCounts = ctx.splitExpressions(varargCount, "varargCountsConcatWs", +("InternalRow", ctx.INPUT_ROW) :: Nil, +"int", +{ body => + s""" + int $varargNum = 0; + $body + return $varargNum; + """ +}, +_.mkString(s"$varargNum += ", s";\n$varargNum += ", ";")) + val varargBuilds = ctx.splitExpressions(varargBuild, "varargBuildsConcatWs", --- End diff -- shall we optimize for `eval.isNull == "true"` in `varargCount` and `varargBuild`? since you already did it for the all string cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19777#discussion_r151953946 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -125,19 +125,43 @@ case class ConcatWs(children: Seq[Expression]) if (children.forall(_.dataType == StringType)) { // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val separator = evals.head + val strings = evals.tail + val numArgs = strings.length + val args = ctx.freshName("args") + + val inputs = strings.zipWithIndex.map { case (eval, index) => +if (eval.isNull != "true") { + s""" + ${eval.code} + if (!${eval.isNull}) { + $args[$index] = ${eval.value}; + } + """ +} else { + "" +} + } + val codes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) { +ctx.splitExpressions(inputs, "valueConcatWs", + ("InternalRow", ctx.INPUT_ROW) :: ("UTF8String[]", args) :: Nil) + } else { +inputs.mkString("\n") + } + ev.copy(s""" +UTF8String[] $args = new UTF8String[$numArgs]; +${separator.code} +$codes +UTF8String ${ev.value} = UTF8String.concatWs(${separator.value}, $args); boolean ${ev.isNull} = ${ev.value} == null; """) } else { val array = ctx.freshName("array") + ctx.addMutableState("UTF8String[]", array, "") val varargNum = ctx.freshName("varargNum") + ctx.addMutableState("int", varargNum, "") val idxInVararg = ctx.freshName("idxInVararg") + ctx.addMutableState("int", idxInVararg, "") --- End diff -- Sure, let me do it later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19777#discussion_r151951926 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -125,19 +125,43 @@ case class ConcatWs(children: Seq[Expression]) if (children.forall(_.dataType == StringType)) { // All children are strings. In that case we can construct a fixed size array. val evals = children.map(_.genCode(ctx)) - - val inputs = evals.map { eval => -s"${eval.isNull} ? (UTF8String) null : ${eval.value}" - }.mkString(", ") - - ev.copy(evals.map(_.code).mkString("\n") + s""" -UTF8String ${ev.value} = UTF8String.concatWs($inputs); + val separator = evals.head + val strings = evals.tail + val numArgs = strings.length + val args = ctx.freshName("args") + + val inputs = strings.zipWithIndex.map { case (eval, index) => +if (eval.isNull != "true") { + s""" + ${eval.code} + if (!${eval.isNull}) { + $args[$index] = ${eval.value}; + } + """ +} else { + "" +} + } + val codes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) { +ctx.splitExpressions(inputs, "valueConcatWs", + ("InternalRow", ctx.INPUT_ROW) :: ("UTF8String[]", args) :: Nil) + } else { +inputs.mkString("\n") + } + ev.copy(s""" +UTF8String[] $args = new UTF8String[$numArgs]; +${separator.code} +$codes +UTF8String ${ev.value} = UTF8String.concatWs(${separator.value}, $args); boolean ${ev.isNull} = ${ev.value} == null; """) } else { val array = ctx.freshName("array") + ctx.addMutableState("UTF8String[]", array, "") val varargNum = ctx.freshName("varargNum") + ctx.addMutableState("int", varargNum, "") val idxInVararg = ctx.freshName("idxInVararg") + ctx.addMutableState("int", idxInVararg, "") --- End diff -- can we do better? I think we can avoid these global vaiables. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/19777 [SPARK-22549][SQL] Fix 64KB JVM bytecode limit problem with concat_ws ## What changes were proposed in this pull request? This PR changes `concat_ws` code generation to place generated code for expression for arguments into separated methods if these size could be large. This PR resolved the case of `concat_ws` with a lot of argument ## How was this patch tested? Added new test cases into `StringExpressionsSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-22549 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19777.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19777 commit 1714c883d193f96e43c19e72a96c15c23d5fc193 Author: Kazuaki Ishizaki Date: 2017-11-18T02:40:00Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org