[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22789 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/22789#discussion_r228760802 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala --- @@ -319,4 +319,15 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext { assert(df.limit(1).collect() === Array(Row("bat", 8.0))) } } + + test("SPARK-25767: Lazy evaluated stream of expressions handled correctly") { --- End diff -- oh wait, I executed those commands in sparkShell and that falls back to interpreted mode when compilation fails. That is what happened here. Sorry about the fuss. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Github user peter-toth commented on a diff in the pull request: https://github.com/apache/spark/pull/22789#discussion_r228753985 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -146,7 +146,10 @@ trait CodegenSupport extends SparkPlan { if (outputVars != null) { assert(outputVars.length == output.length) // outputVars will be used to generate the code for UnsafeRow, so we should copy them -outputVars.map(_.copy()) +outputVars.map(_.copy()) match { --- End diff -- What I found is that in `prepareRowVar` the `evaluateVariables` call clears the code of `outputVars`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Github user peter-toth commented on a diff in the pull request: https://github.com/apache/spark/pull/22789#discussion_r228753682 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala --- @@ -319,4 +319,15 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext { assert(df.limit(1).collect() === Array(Row("bat", 8.0))) } } + + test("SPARK-25767: Lazy evaluated stream of expressions handled correctly") { --- End diff -- Strange, I've just rerun it on master and got: ``` 08:34:45.754 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 81, Column 17: Expression "bhj_isNull_6" is not an rvalue org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 81, Column 17: Expression "bhj_isNull_6" is not an rvalue ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/22789#discussion_r228749168 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -146,7 +146,10 @@ trait CodegenSupport extends SparkPlan { if (outputVars != null) { assert(outputVars.length == output.length) // outputVars will be used to generate the code for UnsafeRow, so we should copy them -outputVars.map(_.copy()) +outputVars.map(_.copy()) match { --- End diff -- What in `consume` is relying on a side effect of traversing the `outputVars`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/22789#discussion_r228748979 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala --- @@ -319,4 +319,15 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext { assert(df.limit(1).collect() === Array(Row("bat", 8.0))) } } + + test("SPARK-25767: Lazy evaluated stream of expressions handled correctly") { --- End diff -- Even without this patch this test would pass (I just tried it on master). A stream always evaluates its first element, so you probably need to add another key here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22789#discussion_r227008114 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -146,7 +146,7 @@ trait CodegenSupport extends SparkPlan { if (outputVars != null) { assert(outputVars.length == output.length) // outputVars will be used to generate the code for UnsafeRow, so we should copy them -outputVars.map(_.copy()) +outputVars.map(_.copy()).toBuffer --- End diff -- can we do something like: ``` val res = outputVars.map(_.copy()) res match { case x: Stream => x.force case o => o } ``` in order to avoid doing the `toBuffer` operation when not needed (please use more meaningful names than in my example in case it is feasible :) )? cc @hvanhovell who I remember fixed a similar issue with `Stream`s. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org