[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20685 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88033/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20685 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20464 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88039/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20685 **[Test build #88033 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88033/testReport)** for PR 20685 at commit [`4e4f075`](https://github.com/apache/spark/commit/4e4f07544d17ea0493b4c5887d8215550eedc424). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20464 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20464 **[Test build #88039 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88039/testReport)** for PR 20464 at commit [`8c1a8ec`](https://github.com/apache/spark/commit/8c1a8ec46ea28ce17fcaae42aa7b9955cb34bfc8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20753: [SPARK-23582][SQL] StaticInvoke should support in...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20753#discussion_r172755611 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -133,8 +134,21 @@ case class StaticInvoke( override def nullable: Boolean = needNullCheck || returnNullable override def children: Seq[Expression] = arguments - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + override def eval(input: InternalRow): Any = { +if (staticObject == null) { + throw new RuntimeException("The static class cannot be null.") +} + +val parmTypes = arguments.map(e => + CallMethodViaReflection.typeMapping.getOrElse(e.dataType, +Seq(e.dataType.asInstanceOf[ObjectType].cls))(0)) +val parms = arguments.map(e => e.eval(input).asInstanceOf[Object]) --- End diff -- We need null checks here for inputs? Also, can we add a common function in `InvokeLike` to handle input arguments for other `InvokeLike` eprs? (I mean the interpreted version of `InvokeLike.prepareArguments`). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20753: [SPARK-23582][SQL] StaticInvoke should support in...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20753#discussion_r172754548 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -133,8 +134,21 @@ case class StaticInvoke( override def nullable: Boolean = needNullCheck || returnNullable override def children: Seq[Expression] = arguments - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + override def eval(input: InternalRow): Any = { +if (staticObject == null) { --- End diff -- We need this check? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20682 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20682 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88036/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20682 **[Test build #88036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88036/testReport)** for PR 20682 at commit [`c1b7413`](https://github.com/apache/spark/commit/c1b7413d356dafdc607683292bfff7b1a57cdf27). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-check-en...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20735 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-check-en...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20735 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88038/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-check-en...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20735 **[Test build #88038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88038/testReport)** for PR 20735 at commit [`a9d3fa5`](https://github.com/apache/spark/commit/a9d3fa5ead2ebec5f44615dc272056fe59f6130a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20756 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20756 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88032/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20756 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88034/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20756 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20754: [SPARK-23287][MESOS] Spark scheduler does not remove ini...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20754 @devaraj-kavali can you add test for this? cc @susanxhuynh --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20464 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1347/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20464 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20687 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20687 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88035/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20756 **[Test build #88034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88034/testReport)** for PR 20756 at commit [`b8f171e`](https://github.com/apache/spark/commit/b8f171e5492f3156767589ad4c6ed458cb24615c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20756 **[Test build #88032 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88032/testReport)** for PR 20756 at commit [`0c48a9b`](https://github.com/apache/spark/commit/0c48a9ba2551435e3794b4e98002423b9a8d527b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20687 **[Test build #88035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88035/testReport)** for PR 20687 at commit [`63c7098`](https://github.com/apache/spark/commit/63c7098fc4b14af7859580682f17c73abcd7ff08). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallb...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20678#discussion_r172751054 --- Diff: docs/sql-programming-guide.md --- @@ -1689,6 +1689,10 @@ using the call `toPandas()` and when creating a Spark DataFrame from a Pandas Da `createDataFrame(pandas_df)`. To use Arrow when executing these calls, users need to first set the Spark configuration 'spark.sql.execution.arrow.enabled' to 'true'. This is disabled by default. +In addition, optimizations enabled by 'spark.sql.execution.arrow.enabled' could fallback automatically +to non-optimized implementations if an error occurs before the actual computation within Spark. --- End diff -- very minor nit: `non-optimized implementations` --> `non-Arrow optimization implementation` this matches the description in the paragraph below --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallb...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20678#discussion_r172751164 --- Diff: docs/sql-programming-guide.md --- @@ -1800,6 +1800,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see ## Upgrading From Spark SQL 2.3 to 2.4 - Since Spark 2.4, Spark maximizes the usage of a vectorized ORC reader for ORC files by default. To do that, `spark.sql.orc.impl` and `spark.sql.orc.filterPushdown` change their default values to `native` and `true` respectively. + - In PySpark, when Arrow optimization is enabled, previously `toPandas` just failed when Arrow optimization is unabled to be used whereas `createDataFrame` from Pandas DataFrame allowed the fallback to non-optimization. Now, both `toPandas` and `createDataFrame` from Pandas DataFrame allow the fallback by default, which can be switched by `spark.sql.execution.arrow.fallback.enabled`. --- End diff -- `which can be switched by` -> `which can be switched on by` or `which can be switched on with` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20464 **[Test build #88039 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88039/testReport)** for PR 20464 at commit [`8c1a8ec`](https://github.com/apache/spark/commit/8c1a8ec46ea28ce17fcaae42aa7b9955cb34bfc8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20464#discussion_r172751184 --- Diff: docs/sparkr.md --- @@ -663,3 +663,7 @@ You can inspect the search path in R with [`search()`](https://stat.ethz.ch/R-ma - The `stringsAsFactors` parameter was previously ignored with `collect`, for example, in `collect(createDataFrame(iris), stringsAsFactors = TRUE))`. It has been corrected. - For `summary`, option for statistics to compute has been added. Its output is changed from that from `describe`. - A warning can be raised if versions of SparkR package and the Spark JVM do not match. + +## Upgrading to Spark 2.4.0 + + - The `start` parameter of `substr` method was wrongly subtracted by one, previously. In other words, the index specified by `start` parameter was considered as 0-base. This can lead to inconsistent substring results and also does not match with the behaviour with `substr` in R. It has been fixed so the `start` parameter of `substr` method is now 1-base, e.g., `substr(df$a, 2, 5)` should be changed to `substr(df$a, 1, 4)`. --- End diff -- Yes. Added. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20464#discussion_r172750404 --- Diff: docs/sparkr.md --- @@ -663,3 +663,7 @@ You can inspect the search path in R with [`search()`](https://stat.ethz.ch/R-ma - The `stringsAsFactors` parameter was previously ignored with `collect`, for example, in `collect(createDataFrame(iris), stringsAsFactors = TRUE))`. It has been corrected. - For `summary`, option for statistics to compute has been added. Its output is changed from that from `describe`. - A warning can be raised if versions of SparkR package and the Spark JVM do not match. + +## Upgrading to Spark 2.4.0 + + - The `start` parameter of `substr` method was wrongly subtracted by one, previously. In other words, the index specified by `start` parameter was considered as 0-base. This can lead to inconsistent substring results and also does not match with the behaviour with `substr` in R. It has been fixed so the `start` parameter of `substr` method is now 1-base, e.g., `substr(df$a, 2, 5)` should be changed to `substr(df$a, 1, 4)`. --- End diff -- could you add `method is now 1-base, e.g., therefore to get the same result as substr(df$a, 2, 5), it should be changed to substr(df$a, 1, 4)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-check-en...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20735 **[Test build #88038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88038/testReport)** for PR 20735 at commit [`a9d3fa5`](https://github.com/apache/spark/commit/a9d3fa5ead2ebec5f44615dc272056fe59f6130a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-check-en...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20735 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-check-en...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20735 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1346/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20696: [SPARK-23525] [SQL] Support ALTER TABLE CHANGE COLUMN CO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20696 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88031/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20696: [SPARK-23525] [SQL] Support ALTER TABLE CHANGE COLUMN CO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20696 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20696: [SPARK-23525] [SQL] Support ALTER TABLE CHANGE COLUMN CO...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20696 **[Test build #88031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88031/testReport)** for PR 20696 at commit [`48fc338`](https://github.com/apache/spark/commit/48fc338dc30720aa05e1871d69bad66ae2dfaa59). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20757: [SPARK-23595][SQL] ValidateExternalType should support i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20757 **[Test build #88037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88037/testReport)** for PR 20757 at commit [`d53cfea`](https://github.com/apache/spark/commit/d53cfea1be24c1e0ae6fce6653a0f686719cd1c4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20757: [SPARK-23595][SQL] ValidateExternalType should support i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20757 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1345/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20757: [SPARK-23595][SQL] ValidateExternalType should support i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20757 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20757: [SPARK-23595][SQL] ValidateExternalType should su...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/20757 [SPARK-23595][SQL] ValidateExternalType should support interpreted execution ## What changes were proposed in this pull request? This pr supported interpreted mode for `ValidateExternalType`. ## How was this patch tested? Added tests in `ObjectExpressionsSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-23595 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20757.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20757 commit d53cfea1be24c1e0ae6fce6653a0f686719cd1c4 Author: Takeshi Yamamuro Date: 2018-03-06T17:06:28Z ValidateExternalType should support interpreted execution --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20685 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/20685 @cloud-fan @squito Thanks a lot ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172743278 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java --- @@ -57,20 +59,20 @@ // The data stored in these two allocations need to maintain binary compatible. We can // directly pass this buffer to external components. - private long nulls; --- End diff -- yea, I think `UTF8String` is good enough as the first show case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20755: [SPARK-23406][SS] Enable stream-stream self-joins for br...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20755 **[Test build #88030 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88030/testReport)** for PR 20755 at commit [`484babb`](https://github.com/apache/spark/commit/484babb58d9cf61d5dcc6521865cd2a5db64dd82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20755: [SPARK-23406][SS] Enable stream-stream self-joins for br...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20755 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20696: [SPARK-23525] [SQL] Support ALTER TABLE CHANGE COLUMN CO...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20696 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20755: [SPARK-23406][SS] Enable stream-stream self-joins for br...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20755 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88030/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20685 sounds reasonable. The purpose of this corruption check is to fail fast to retry the stage(re-shuffle), so disk corruption should also be counted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20753 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20753 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88029/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20753 **[Test build #88029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88029/testReport)** for PR 20753 at commit [`f570692`](https://github.com/apache/spark/commit/f570692616cfd7921470029051705c44e4b9c5db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/20702 @gatorsmile @liufengdb please take a look at this, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20682 **[Test build #88036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88036/testReport)** for PR 20682 at commit [`c1b7413`](https://github.com/apache/spark/commit/c1b7413d356dafdc607683292bfff7b1a57cdf27). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20687 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1344/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20756 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1343/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20687 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20756 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20682 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20748: [SPARK-23611][SQL] Add a helper function to check except...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88027/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20748: [SPARK-23611][SQL] Add a helper function to check except...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20748 @hvanhovell ok, check again? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20748: [SPARK-23611][SQL] Add a helper function to check except...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20748 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20756 **[Test build #88034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88034/testReport)** for PR 20756 at commit [`b8f171e`](https://github.com/apache/spark/commit/b8f171e5492f3156767589ad4c6ed458cb24615c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20748: [SPARK-23611][SQL] Add a helper function to check except...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20748 **[Test build #88027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88027/testReport)** for PR 20748 at commit [`aeca542`](https://github.com/apache/spark/commit/aeca5428a179e932fa5fdbfbe8de2f64b64a4b43). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-c...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20735#discussion_r172732614 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -736,7 +736,8 @@ private object YarnAllocator { def memLimitExceededLogMessage(diagnostics: String, pattern: Pattern): String = { val matcher = pattern.matcher(diagnostics) val diag = if (matcher.find()) " " + matcher.group() + "." else "" -("Container killed by YARN for exceeding memory limits." + diag - + " Consider boosting spark.yarn.executor.memoryOverhead.") +s"Container killed by YARN for exceeding memory limits. $diag " + + "Consider boosting spark.yarn.executor.memoryOverhead or " + + "disable yarn.nodemanager.vmem-check-enabled because of YARN-4714." --- End diff -- Thank you for confirmation! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20687 **[Test build #88035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88035/testReport)** for PR 20687 at commit [`63c7098`](https://github.com/apache/spark/commit/63c7098fc4b14af7859580682f17c73abcd7ff08). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20687 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20685 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1342/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20685 **[Test build #88033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88033/testReport)** for PR 20685 at commit [`4e4f075`](https://github.com/apache/spark/commit/4e4f07544d17ea0493b4c5887d8215550eedc424). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20756 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20685 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20756 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1341/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-c...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20735#discussion_r172732010 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -736,7 +736,8 @@ private object YarnAllocator { def memLimitExceededLogMessage(diagnostics: String, pattern: Pattern): String = { val matcher = pattern.matcher(diagnostics) val diag = if (matcher.find()) " " + matcher.group() + "." else "" -("Container killed by YARN for exceeding memory limits." + diag - + " Consider boosting spark.yarn.executor.memoryOverhead.") +s"Container killed by YARN for exceeding memory limits. $diag " + + "Consider boosting spark.yarn.executor.memoryOverhead or " + + "disable yarn.nodemanager.vmem-check-enabled because of YARN-4714." --- End diff -- nit: "disable" -> "disabling"? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20756 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1340/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20756 **[Test build #88032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88032/testReport)** for PR 20756 at commit [`0c48a9b`](https://github.com/apache/spark/commit/0c48a9ba2551435e3794b4e98002423b9a8d527b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20756: [SPARK-23593][SQL] Add interpreted execution for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20756#discussion_r172731452 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1254,8 +1254,24 @@ case class InitializeJavaBean(beanInstance: Expression, setters: Map[String, Exp override def children: Seq[Expression] = beanInstance +: setters.values.toSeq override def dataType: DataType = beanInstance.dataType - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + override def eval(input: InternalRow): Any = { +val instance = beanInstance.eval(input).asInstanceOf[Object] +if (instance != null) { + setters.foreach { case (setterMethod, fieldExpr) => +val fieldValue = fieldExpr.eval(input).asInstanceOf[Object] + +val foundMethods = instance.getClass.getMethods.filter { method => + method.getName == setterMethod && Modifier.isPublic(method.getModifiers) && +method.getParameterTypes.length == 1 +} +assert(foundMethods.length == 1, + throw new RuntimeException("The Java Bean instance should have only one " + --- End diff -- codegen evaluation does not check method existence. But for non-codegen evaluation here, it is a bit weird to directly invoke first found method (we may not find it). cc @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20756: [SPARK-23593][SQL] Add interpreted execution for Initial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20756 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20685: [SPARK-23524] Big local shuffle blocks should not...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/20685#discussion_r172731294 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -583,8 +587,8 @@ object ShuffleBlockFetcherIterator { * Result of a fetch from a remote block successfully. * @param blockId block id * @param address BlockManager that the block was fetched from. - * @param size estimated size of the block, used to calculate bytesInFlight. - * Note that this is NOT the exact bytes. + * @param size estimated size of the block. Note that this is NOT the exact bytes. +*Size of remote block is used to calculate bytesInFlight. --- End diff -- nit: documentation style --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20756: [SPARK-23593][SQL] Add interpreted execution for ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/20756 [SPARK-23593][SQL] Add interpreted execution for InitializeJavaBean expression ## What changes were proposed in this pull request? Add interpreted execution for `InitializeJavaBean` expression. ## How was this patch tested? Added unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-23593 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20756.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20756 commit 978080bd8f5a1b095fd0d58ff529e16dd9cbadba Author: Liang-Chi Hsieh Date: 2018-03-07T02:56:53Z Add interpreted execution for InitializeJavaBean expression. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20688: [SPARK-23096][SS] Migrate rate source to V2
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20688#discussion_r172730994 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala --- @@ -24,8 +24,8 @@ import org.json4s.jackson.Serialization import org.apache.spark.sql.Row import org.apache.spark.sql.catalyst.util.DateTimeUtils -import org.apache.spark.sql.execution.streaming.{RateSourceProvider, RateStreamOffset, ValueRunTimeMsPair} -import org.apache.spark.sql.execution.streaming.sources.RateStreamSourceV2 +import org.apache.spark.sql.execution.streaming.{RateStreamOffset, ValueRunTimeMsPair} +import org.apache.spark.sql.execution.streaming.sources.RateSourceProvider import org.apache.spark.sql.sources.v2.DataSourceOptions import org.apache.spark.sql.sources.v2.reader._ import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousDataReader, ContinuousReader, Offset, PartitionOffset} --- End diff -- Could you make the names of the different readers consistent with each other? Similar to Kafka? RateStreamProvider RateStreamMicroBatchReader, RateStreamMicroBatchDataReaderFactory RateStreamContinuousReader, --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20688: [SPARK-23096][SS] Migrate rate source to V2
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20688#discussion_r172730858 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/RateSourceSuite.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.sources + +import java.nio.file.Files +import java.util.Optional +import java.util.concurrent.TimeUnit + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.errors.TreeNodeException +import org.apache.spark.sql.execution.datasources.DataSource +import org.apache.spark.sql.execution.streaming._ +import org.apache.spark.sql.execution.streaming.continuous._ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.sources.v2.{ContinuousReadSupport, DataSourceOptions, MicroBatchReadSupport} +import org.apache.spark.sql.sources.v2.reader.streaming.Offset +import org.apache.spark.sql.streaming.StreamTest +import org.apache.spark.util.ManualClock + +class RateSourceSuite extends StreamTest { --- End diff -- Hi @tdas , I think I used "git mv", the thing is that when the diff is larger then x%, it will treat as "git rm" and "git add" (https://makandracards.com/makandra/30957-git-how-to-get-a-useful-diff-when-renaming-files). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20688: [SPARK-23096][SS] Migrate rate source to V2
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20688#discussion_r172730333 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/RateSourceSuite.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.sources + +import java.nio.file.Files +import java.util.Optional +import java.util.concurrent.TimeUnit + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.errors.TreeNodeException +import org.apache.spark.sql.execution.datasources.DataSource +import org.apache.spark.sql.execution.streaming._ +import org.apache.spark.sql.execution.streaming.continuous._ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.sources.v2.{ContinuousReadSupport, DataSourceOptions, MicroBatchReadSupport} +import org.apache.spark.sql.sources.v2.reader.streaming.Offset +import org.apache.spark.sql.streaming.StreamTest +import org.apache.spark.util.ManualClock + +class RateSourceSuite extends StreamTest { --- End diff -- Why did you not move this file using "git mv" and then change? Then we would have been able to diff it properly. This was a pain in the text socket v2 PR as well :( --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20688: [SPARK-23096][SS] Migrate rate source to V2
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20688#discussion_r172729894 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateSourceProvider.scala --- @@ -0,0 +1,291 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.sources + +import java.io._ +import java.nio.charset.StandardCharsets +import java.util.Optional +import java.util.concurrent.TimeUnit + +import scala.collection.JavaConverters._ + +import org.apache.commons.io.IOUtils + +import org.apache.spark.internal.Logging +import org.apache.spark.network.util.JavaUtils +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.execution.streaming._ +import org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader +import org.apache.spark.sql.sources.DataSourceRegister +import org.apache.spark.sql.sources.v2.{ContinuousReadSupport, DataSourceOptions, DataSourceV2, MicroBatchReadSupport} +import org.apache.spark.sql.sources.v2.reader._ +import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousReader, MicroBatchReader, Offset} +import org.apache.spark.sql.types.{LongType, StructField, StructType, TimestampType} +import org.apache.spark.util.{ManualClock, SystemClock} + +object RateSourceProvider { + val SCHEMA = +StructType(StructField("timestamp", TimestampType) :: StructField("value", LongType) :: Nil) + + val VERSION = 1 + + val NUM_PARTITIONS = "numPartitions" + val ROWS_PER_SECOND = "rowsPerSecond" + val RAMP_UP_TIME = "rampUpTime" + + /** Calculate the end value we will emit at the time `seconds`. */ + def valueAtSecond(seconds: Long, rowsPerSecond: Long, rampUpTimeSeconds: Long): Long = { +// E.g., rampUpTimeSeconds = 4, rowsPerSecond = 10 +// Then speedDeltaPerSecond = 2 +// +// seconds = 0 1 2 3 4 5 6 +// speed = 0 2 4 6 8 10 10 (speedDeltaPerSecond * seconds) +// end value = 0 2 6 12 20 30 40 (0 + speedDeltaPerSecond * seconds) * (seconds + 1) / 2 +val speedDeltaPerSecond = rowsPerSecond / (rampUpTimeSeconds + 1) +if (seconds <= rampUpTimeSeconds) { + // Calculate "(0 + speedDeltaPerSecond * seconds) * (seconds + 1) / 2" in a special way to + // avoid overflow + if (seconds % 2 == 1) { +(seconds + 1) / 2 * speedDeltaPerSecond * seconds + } else { +seconds / 2 * speedDeltaPerSecond * (seconds + 1) + } +} else { + // rampUpPart is just a special case of the above formula: rampUpTimeSeconds == seconds + val rampUpPart = valueAtSecond(rampUpTimeSeconds, rowsPerSecond, rampUpTimeSeconds) + rampUpPart + (seconds - rampUpTimeSeconds) * rowsPerSecond +} + } +} + +class RateSourceProvider extends DataSourceV2 + with MicroBatchReadSupport with ContinuousReadSupport with DataSourceRegister { + import RateSourceProvider._ + + private def checkParameters(options: DataSourceOptions): Unit = { +if (options.get(ROWS_PER_SECOND).isPresent) { + val rowsPerSecond = options.get(ROWS_PER_SECOND).get().toLong + if (rowsPerSecond <= 0) { +throw new IllegalArgumentException( + s"Invalid value '$rowsPerSecond'. The option 'rowsPerSecond' must be positive") + } +} + +if (options.get(RAMP_UP_TIME).isPresent) { + val rampUpTimeSeconds = +JavaUtils.timeStringAsSec(options.get(RAMP_UP_TIME).get()) + if (rampUpTimeSeconds < 0) { +throw new IllegalArgumentException( + s"Invalid value '$rampUpTimeSeconds'. The option 'rampUpTime' must not be negative") + } +} + +if (options.get(NUM_PARTITIONS).isPresent) { + val numPartitions = o
[GitHub] spark pull request #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-c...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20735#discussion_r172729670 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -736,7 +736,8 @@ private object YarnAllocator { def memLimitExceededLogMessage(diagnostics: String, pattern: Pattern): String = { val matcher = pattern.matcher(diagnostics) val diag = if (matcher.find()) " " + matcher.group() + "." else "" -("Container killed by YARN for exceeding memory limits." + diag - + " Consider boosting spark.yarn.executor.memoryOverhead.") +s"Container killed by YARN for exceeding memory limits. $diag " + + "Consider boosting spark.yarn.executor.memoryOverhead or " + + "disable yarn.nodemanager.vmem-check-enabled because of YARN-4714." --- End diff -- The changes looks fine to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20685 it'll also help with disk corruption ... from the stack traces in SPARK-4105 you can't really tell what the source of the problem is. it'll be pretty hard to determine what the source of corruption is if we start seeing it again. anyway, I don't feel that strongly about it either way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20696: [SPARK-23525] [SQL] Support ALTER TABLE CHANGE COLUMN CO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20696 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20696: [SPARK-23525] [SQL] Support ALTER TABLE CHANGE COLUMN CO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20696 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1339/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20696: [SPARK-23525] [SQL] Support ALTER TABLE CHANGE COLUMN CO...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20696 **[Test build #88031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88031/testReport)** for PR 20696 at commit [`48fc338`](https://github.com/apache/spark/commit/48fc338dc30720aa05e1871d69bad66ae2dfaa59). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20755: [SPARK-23406][SS] Enable stream-stream self-joins...
Github user tdas closed the pull request at: https://github.com/apache/spark/pull/20755 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20755: [SPARK-23406][SS] Enable stream-stream self-joins for br...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20755 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1338/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20755: [SPARK-23406][SS] Enable stream-stream self-joins for br...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20755 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20755: [SPARK-23406][SS] Enable stream-stream self-joins for br...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20755 **[Test build #88030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88030/testReport)** for PR 20755 at commit [`484babb`](https://github.com/apache/spark/commit/484babb58d9cf61d5dcc6521865cd2a5db64dd82). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20755: [SPARK-23406][SS] Enable stream-stream self-joins...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/20755 [SPARK-23406][SS] Enable stream-stream self-joins for branch-2.3 ## What changes were proposed in this pull request? This is limited but safe-to-backport version of self-join-fix made in #20598 That PR solved two bugs 1. Add MultiInstanceRelation trait to leaf logical nodes to allow resolution - This is the major fix required to allow streaming self-joins, and is safe to backport. 2. Fix attribute rewriting in MicroBatchExecution when micro-batch plans are spliced into the streaming logical plan - This is a minor fix that is not safe to backport. Without this fix only a very small fraction self-join cases will have issues, but those issues may lead to incorrect results. ## How was this patch tested? New unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-23406-2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20755.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20755 commit 484babb58d9cf61d5dcc6521865cd2a5db64dd82 Author: Tathagata Das Date: 2018-03-07T00:53:34Z Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20753: [SPARK-23582][SQL] StaticInvoke should support in...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20753#discussion_r172722081 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -133,8 +134,21 @@ case class StaticInvoke( override def nullable: Boolean = needNullCheck || returnNullable override def children: Seq[Expression] = arguments - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + override def eval(input: InternalRow): Any = { +if (staticObject == null) { + throw new RuntimeException("The static class cannot be null.") +} + +val parmTypes = arguments.map(e => + CallMethodViaReflection.typeMapping.getOrElse(e.dataType, +Seq(e.dataType.asInstanceOf[ObjectType].cls))(0)) --- End diff -- You are right. I have to support other types before merging this change. This is a prototype for discussing whether we use reflection or not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20754: [SPARK-23287][MESOS] Spark scheduler does not remove ini...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20754 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20754: [SPARK-23287][MESOS] Spark scheduler does not remove ini...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20754 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20727#discussion_r172721381 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -30,9 +31,19 @@ import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl /** * An adaptor from a [[PartitionedFile]] to an [[Iterator]] of [[Text]], which are all of the lines * in that file. + * + * @param file A part (i.e. "block") of a single file that should be read line by line. + * @param lineSeparator A line separator that should be used for each line. If the value is `None`, + * it covers `\r`, `\r\n` and `\n`. + * @param conf Hadoop configuration */ class HadoopFileLinesReader( -file: PartitionedFile, conf: Configuration) extends Iterator[Text] with Closeable { +file: PartitionedFile, +lineSeparator: Option[String], +conf: Configuration) extends Iterator[Text] with Closeable { --- End diff -- Note that it's an internal API for datasources and Hadoop's Text already has an assumption for utf8. I don't think we should call getBytes with utf8 at each caller side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20754: [SPARK-23287][MESOS] Spark scheduler does not rem...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/20754 [SPARK-23287][MESOS] Spark scheduler does not remove initial executor if not one job submitted ## What changes were proposed in this pull request? In `ExecutorAllocationManager.schedule()`, `numExecutorsTarget` is getting updated as part of `updateAndSyncNumExecutorsTarget(now)` but it skips updating till initializing becomes false, `removeExecutors()` is not removing the expired executors since the condition `else if (newExecutorTotal - 1 < numExecutorsTarget) { ` is satisfying to skip them, and they are missing to remove and continues running till the application completes. I moved the `updateAndSyncNumExecutorsTarget(now)` to after the expiry check and initializing var assignment if eligible so that the updated `numExecutorsTarget `can be used while removing executors. ## How was this patch tested? I verified it manually by enabling the dynamic allocation with Mesos mode, now it removes the executors when they are not getting assigned any task for the specified executorIdleTimeout. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-23287 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20754.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20754 commit 5bef384acfe3d76949dceab669743e15373bad57 Author: Devaraj K Date: 2018-03-07T01:54:58Z SPARK-23287 Spark scheduler does not remove initial executor if not one job submitted --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20753: [SPARK-23582][SQL] StaticInvoke should support in...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20753#discussion_r172718446 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -133,8 +134,21 @@ case class StaticInvoke( override def nullable: Boolean = needNullCheck || returnNullable override def children: Seq[Expression] = arguments - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + override def eval(input: InternalRow): Any = { +if (staticObject == null) { + throw new RuntimeException("The static class cannot be null.") +} + +val parmTypes = arguments.map(e => + CallMethodViaReflection.typeMapping.getOrElse(e.dataType, +Seq(e.dataType.asInstanceOf[ObjectType].cls))(0)) --- End diff -- The external types of native types `CalendarIntervalType` and `BinaryType` are not `ObjectType`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20753: [SPARK-23582][SQL] StaticInvoke should support in...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20753#discussion_r172717961 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -133,8 +134,21 @@ case class StaticInvoke( override def nullable: Boolean = needNullCheck || returnNullable override def children: Seq[Expression] = arguments - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + override def eval(input: InternalRow): Any = { +if (staticObject == null) { + throw new RuntimeException("The static class cannot be null.") +} + +val parmTypes = arguments.map(e => + CallMethodViaReflection.typeMapping.getOrElse(e.dataType, +Seq(e.dataType.asInstanceOf[ObjectType].cls))(0)) +val parms = arguments.map(e => e.eval(input).asInstanceOf[Object]) +val method = staticObject.getDeclaredMethod(functionName, parmTypes : _*) +val ret = method.invoke(null, parms : _*) +val retClass = CallMethodViaReflection.typeMapping.getOrElse(dataType, + Seq(dataType.asInstanceOf[ObjectType].cls))(0) --- End diff -- Will `dataType` always be an `ObjectType` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org