[GitHub] spark issue #20802: [SPARK-23651][core]Add a check for host name
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20802 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88190/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20802: [SPARK-23651][core]Add a check for host name
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20802 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20802: [SPARK-23651][core]Add a check for host name
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20802 **[Test build #88190 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88190/testReport)** for PR 20802 at commit [`d0e724f`](https://github.com/apache/spark/commit/d0e724f030df830268bac727e83a799c127a5dfd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1480/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20692 @mgaido91 Thanks for you investigation! These two weeks I am swamped. Will get back to you next week. Sorry for the delay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #88198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88198/testReport)** for PR 20433 at commit [`39fe5dc`](https://github.com/apache/spark/commit/39fe5dc179004946b378c250d3a89132a4fad444). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20808: [SPARK-23662][SQL] Support selective tests in SQLQueryTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20808 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1479/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20808: [SPARK-23662][SQL] Support selective tests in SQLQueryTe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20808 **[Test build #88197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88197/testReport)** for PR 20808 at commit [`093f405`](https://github.com/apache/spark/commit/093f40555ba9dd169ab2fed8f2ecf4f08dd66627). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20808: [SPARK-23662][SQL] Support selective tests in SQLQueryTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20808 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20808: [SPARK-23662][SQL] Support selective tests in SQLQueryTe...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20808 cc: @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20808: [SPARK-23662][SQL] Support selective tests in SQL...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/20808 [SPARK-23662][SQL] Support selective tests in SQLQueryTestSuite ## What changes were proposed in this pull request? This pr supported selective tests in `SQLQueryTestSuite`, e.g., ``` SPARK_SQL_QUERY_TEST_FILTER=limit.sql,random.sql build/sbt "sql/test-only *SQLQueryTestSuite" ``` ## How was this patch tested? Manually checked You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark RunSelectiveTests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20808.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20808 commit 093f40555ba9dd169ab2fed8f2ecf4f08dd66627 Author: Takeshi YamamuroDate: 2018-03-13T05:23:54Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20208 Before adding the test cases, schema evolution is officially supported? Could you describe it in details? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20803 ```bash cat < test.sql select '\${a}', '\${b}'; EOF spark-sql --hiveconf a=avalue --hivevar b=bvalue -f test.sql ``` SQL text is `select ${a}, ${b}` or `select avalue, bvalue`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20702 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88195/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20702 **[Test build #88195 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88195/testReport)** for PR 20702 at commit [`93e87f5`](https://github.com/apache/spark/commit/93e87f5f138a758dec8ba5f2d3f888da9a04fb67). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20702 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r174019434 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1192,11 +1195,23 @@ class Analyzer( * @see https://issues.apache.org/jira/browse/SPARK-19737 */ object LookupFunctions extends Rule[LogicalPlan] { -override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions { - case f: UnresolvedFunction if !catalog.functionExists(f.name) => -withPosition(f) { - throw new NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName) -} +override def apply(plan: LogicalPlan): LogicalPlan = { + val catalogFunctionNameSet = new mutable.HashSet[FunctionIdentifier]() + plan.transformAllExpressions { +case f: UnresolvedFunction if catalogFunctionNameSet.contains(f.name) => f +case f: UnresolvedFunction if catalog.functionExists(f.name) => + catalogFunctionNameSet.add(normalizeFuncName(f.name)) + f +case f: UnresolvedFunction => + withPosition(f) { +throw new NoSuchFunctionException(f.name.database.getOrElse("default"), + f.name.funcName) + } + } +} + +private def normalizeFuncName(name: FunctionIdentifier): FunctionIdentifier = { + FunctionIdentifier(name.funcName.toLowerCase(Locale.ROOT), name.database) --- End diff -- the FunctionIdentifier's signature for database is Option, it is not string. since we are just used in this local cache, I think it is ok to not convert to "default" string. I saw when we do the registerFunction in FunctionRegistry.scala, we didn't put the "default" in normalizeFuncName either. What do you think? thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r174019305 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -526,7 +526,7 @@ object SQLConf { val DEFAULT_DATA_SOURCE_NAME = buildConf("spark.sql.sources.default") .doc("The default data source to use in input/output.") .stringConf -.createWithDefault("parquet") +.createWithDefault("orc") --- End diff -- Can you change it back? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Prevent crashes on schema inferring o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20796 **[Test build #88196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88196/testReport)** for PR 20796 at commit [`d6c5f02`](https://github.com/apache/spark/commit/d6c5f02ea1a08513a54ea9f3b30986dd92188b3e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Prevent crashes on schema inferring o...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20796 add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88189/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20795 **[Test build #88189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88189/testReport)** for PR 20795 at commit [`211abcb`](https://github.com/apache/spark/commit/211abcb979787a22b76d05b47d2f21a98991f702). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class LookupFunctionsSuite extends PlanTest ` * `class CustomInMemoryCatalog extends InMemoryCatalog ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88188/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20795 **[Test build #88188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88188/testReport)** for PR 20795 at commit [`99cc3b3`](https://github.com/apache/spark/commit/99cc3b394845d364f8e99de9ba136a2068fa76c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20807 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88193/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20807 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20807 **[Test build #88193 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88193/testReport)** for PR 20807 at commit [`114ac05`](https://github.com/apache/spark/commit/114ac05102c9d563c922447423ec8445bb37e9ef). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r174017514 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1192,11 +1195,23 @@ class Analyzer( * @see https://issues.apache.org/jira/browse/SPARK-19737 */ object LookupFunctions extends Rule[LogicalPlan] { -override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions { - case f: UnresolvedFunction if !catalog.functionExists(f.name) => -withPosition(f) { - throw new NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName) -} +override def apply(plan: LogicalPlan): LogicalPlan = { + val catalogFunctionNameSet = new mutable.HashSet[FunctionIdentifier]() + plan.transformAllExpressions { +case f: UnresolvedFunction if catalogFunctionNameSet.contains(f.name) => f --- End diff -- I will normalize the look up too. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20702 **[Test build #88195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88195/testReport)** for PR 20702 at commit [`93e87f5`](https://github.com/apache/spark/commit/93e87f5f138a758dec8ba5f2d3f888da9a04fb67). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20800: [SPARK-23627][SQL] Provide isEmpty in DataSet
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20800#discussion_r174016939 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -511,6 +511,14 @@ class Dataset[T] private[sql]( */ def isLocal: Boolean = logicalPlan.isInstanceOf[LocalRelation] + /** + * Returns true if the `DataSet` is empty --- End diff -- Dataset --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20702 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20702 cc @liufengdb --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 I added related tests from hive `clientpositive` in `interval.sql`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r174016181 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -83,6 +83,15 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + private val testFilter: Option[String] = { +val testFilter = System.getenv("SPARK_SQL_QUERY_TEST_FILTER") +if (testFilter != null && !testFilter.isEmpty) { + Some(testFilter.toLowerCase(Locale.ROOT)) +} else { + None +} + } + --- End diff -- ok, I'll do later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20433 [This](https://github.com/apache/spark/pull/20433#issuecomment-370726439) sounds good to me --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r174016110 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -83,6 +83,15 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + private val testFilter: Option[String] = { +val testFilter = System.getenv("SPARK_SQL_QUERY_TEST_FILTER") +if (testFilter != null && !testFilter.isEmpty) { + Some(testFilter.toLowerCase(Locale.ROOT)) +} else { + None +} + } + --- End diff -- Let us create a separate PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20806 **[Test build #88194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88194/testReport)** for PR 20806 at commit [`a254d15`](https://github.com/apache/spark/commit/a254d1501c0119b4881c0443f28c263f0c9dec0e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20807 **[Test build #88193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88193/testReport)** for PR 20807 at commit [`114ac05`](https://github.com/apache/spark/commit/114ac05102c9d563c922447423ec8445bb37e9ef). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20806 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1478/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20806 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20763 Thanks! Merged to 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20807 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20807 SPARK-23660: Fix exception in yarn cluster mode when application ended fast ## What changes were proposed in this pull request? Yarn throws the following exception in cluster mode when the application is really small: ``` 18/03/07 23:34:22 WARN netty.NettyRpcEnv: Ignored failure: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7c974942 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@1eea9d2d[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] 18/03/07 23:34:22 ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) at org.apache.spark.deploy.yarn.YarnAllocator.(YarnAllocator.scala:102) at org.apache.spark.deploy.yarn.YarnRMClient.register(YarnRMClient.scala:77) at org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:450) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:493) at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:810) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:809) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:834) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) ... 17 more 18/03/07 23:34:22 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult: ) ``` Example application: ``` object ExampleApp { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("ExampleApp") val sc = new SparkContext(conf) try { // Do nothing } finally { sc.stop() } } ``` This PR makes `initialExecutorIdCounter ` lazy. This way `YarnAllocator` can be instantiated even if the driver already ended. ## How was this patch tested? Automated: Additional unit test added Manual: Application submitted into small cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/gaborgsomogyi/spark SPARK-23660 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20807.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20807 commit 114ac05102c9d563c922447423ec8445bb37e9ef Author: Gabor SomogyiDate: 2018-03-13T04:23:59Z SPARK-23660: Fix exception in yarn cluster mode when application ended fast --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20806 cc @dbtsai @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20806: [SPARK-23661][SQL] Implement treeAggregate on Dat...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/20806 [SPARK-23661][SQL] Implement treeAggregate on Dataset API ## What changes were proposed in this pull request? Many algorithms in MLlib are still not migrated their internal computing workload from RDD to DataFrame. `treeAggregate` is one of obstacles we need to address in order to see complete migration. This patch is submitted to provide `treeAggregate` on Dataset API. For now this should be a private API used by ML component. The approach of tree aggregation imitates RDD's `treeAggregate`. ## How was this patch tested? Added unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 treeAggregate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20806.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20806 commit a254d1501c0119b4881c0443f28c263f0c9dec0e Author: Liang-Chi HsiehDate: 2018-03-12T08:41:20Z Implement treeAggregate on Dataset API. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1477/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #88192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88192/testReport)** for PR 20433 at commit [`1eec819`](https://github.com/apache/spark/commit/1eec81935a0c32de67e7981d74bfb15dbb041917). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88191/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #88191 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88191/testReport)** for PR 20433 at commit [`c0710d6`](https://github.com/apache/spark/commit/c0710d6967caf1e3acc18201ecf54dc3bc98def6). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r174011447 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -83,6 +83,15 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + private val testFilter: Option[String] = { +val testFilter = System.getenv("SPARK_SQL_QUERY_TEST_FILTER") +if (testFilter != null && !testFilter.isEmpty) { + Some(testFilter.toLowerCase(Locale.ROOT)) +} else { + None +} + } + --- End diff -- This is not related to this pr though, I think it is some useful to run tests selectively in `SQLQueryTestSuite` (cuz the number of tests there grows recently...). If possibly, could we add this feature in a separate pr? Otherwise, I'll drop this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1476/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #88191 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88191/testReport)** for PR 20433 at commit [`c0710d6`](https://github.com/apache/spark/commit/c0710d6967caf1e3acc18201ecf54dc3bc98def6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20744: [SPARK-23608][CORE][WebUI] Add synchronization in SHS be...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20744 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88187/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20744: [SPARK-23608][CORE][WebUI] Add synchronization in SHS be...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20744 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20744: [SPARK-23608][CORE][WebUI] Add synchronization in SHS be...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20744 **[Test build #88187 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88187/testReport)** for PR 20744 at commit [`cd7e1f6`](https://github.com/apache/spark/commit/cd7e1f63e6f6614ed3efcc70df53cde41ffb6ff2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20669 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1459/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20669 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1475/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20669 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1459/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20669 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/20669 Newest push passes all tests (with this merged I will then merge in [this](https://github.com/apache-spark-on-k8s/spark-integration/pull/42/files)) ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run SparkPi with a test secret mounted into the driver and executor pods - Run FileCheck using a Remote Data File Run completed in 2 minutes, 37 seconds. Total number of tests run: 7 Suites: completed 2, aborted 0 Tests: succeeded 7, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` I welcome the opinion of the community on the strategy for passing spark.driver.extraJavaOptions to the driver as I am currently specifying the `SPARK_CONF_DIR` to be pointed at the JAVA_PROPERTIES file. Open to any better suggestions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20803 > What if this SQL statement contains --hiveconf or --hivevar? What's meaning? Can you give an example? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20803 @cloud-fan one SQL execution only has one sql statement whatever how many jobs it triggered. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20800: [SPARK-23627][SQL] Provide isEmpty in DataSet
Github user goungoun commented on a diff in the pull request: https://github.com/apache/spark/pull/20800#discussion_r174002184 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -511,6 +511,12 @@ class Dataset[T] private[sql]( */ def isLocal: Boolean = logicalPlan.isInstanceOf[LocalRelation] + /** + * Returns true if the `DataSet` is empty + * --- End diff -- @mgaido91 Thanks, I modified the comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20797: [SPARK-23583][SQL] Invoke should support interpre...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20797#discussion_r174001998 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -266,8 +266,26 @@ case class Invoke( override def nullable: Boolean = targetObject.nullable || needNullCheck || returnNullable override def children: Seq[Expression] = targetObject +: arguments - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + override def eval(input: InternalRow): Any = { +val obj = targetObject.eval(input) +val args = arguments.map(e => e.eval(input).asInstanceOf[Object]) +val argClasses = CallMethodViaReflection.expressionJavaClasses(arguments) +val method = obj.getClass.getDeclaredMethod(functionName, argClasses : _*) +if (needNullCheck && args.exists(_ == null)) { + // return null if one of arguments is null + null +} else { + val ret = method.invoke(obj, args: _*) + + if (CodeGenerator.defaultValue(dataType) == "null") { +ret + } else { +// cast a primitive value using Boxed class +val boxedClass = CallMethodViaReflection.typeBoxedJavaMapping(dataType) +boxedClass.cast(ret) --- End diff -- The point is where we want to cause a cast exception. With this code, an exception will occur at this class. Without this code, an exception will occur anywhere in the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20802: [SPARK-23651][core]Add a check for host name
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20802 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1474/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20802: [SPARK-23651][core]Add a check for host name
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20802 **[Test build #88190 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88190/testReport)** for PR 20802 at commit [`d0e724f`](https://github.com/apache/spark/commit/d0e724f030df830268bac727e83a799c127a5dfd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20802: [SPARK-23651][core]Add a check for host name
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20802 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/20702 @cloud-fan @felixcheung would you please take a review. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r173999280 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -86,16 +94,19 @@ class RFormulaSuite extends MLTest with DefaultReadWriteTest { } } - test("label column already exists but is not numeric type") { + ignore("label column already exists but is not numeric type") { --- End diff -- Thanks, this is a very good catch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r173999125 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -58,14 +57,16 @@ class VectorAssemblerSuite assert(v2.isInstanceOf[DenseVector]) } - test("VectorAssembler") { + ignore("VectorAssembler") { --- End diff -- @attilapiros You need revert code here and keep old `VectorAssembler` testsuite here. `VectorAssembler` do not support streaming mode unless you pipeline a `VectorSizeHint` before it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20797: [SPARK-23583][SQL] Invoke should support interpre...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20797#discussion_r173998493 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -266,8 +266,26 @@ case class Invoke( override def nullable: Boolean = targetObject.nullable || needNullCheck || returnNullable override def children: Seq[Expression] = targetObject +: arguments - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + override def eval(input: InternalRow): Any = { +val obj = targetObject.eval(input) +val args = arguments.map(e => e.eval(input).asInstanceOf[Object]) +val argClasses = CallMethodViaReflection.expressionJavaClasses(arguments) +val method = obj.getClass.getDeclaredMethod(functionName, argClasses : _*) +if (needNullCheck && args.exists(_ == null)) { + // return null if one of arguments is null + null +} else { + val ret = method.invoke(obj, args: _*) + + if (CodeGenerator.defaultValue(dataType) == "null") { +ret + } else { +// cast a primitive value using Boxed class +val boxedClass = CallMethodViaReflection.typeBoxedJavaMapping(dataType) +boxedClass.cast(ret) --- End diff -- For interpreted execution, I think it is less meaningful to check that. cc @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20781: [SPARK-23637][YARN]Yarn might allocate more resource if ...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/20781 @vanzin Thanks for review~ 1. I spent some time but didn't find the reason why same executor is killed multiple times and I cannot reproduce either. 2. I found that same completed container can be processed multiple times. It happens now and then. Seems yarn doesn't promise that same completed container only returned in one response (https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L268) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r173998034 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -299,18 +310,17 @@ class StringIndexerSuite .setInputCol("label") .setOutputCol("labelIndex") -val expected = Seq(Set((0, 0.0), (1, 0.0), (2, 2.0), (3, 1.0), (4, 1.0), (5, 0.0)), - Set((0, 2.0), (1, 2.0), (2, 0.0), (3, 1.0), (4, 1.0), (5, 2.0)), - Set((0, 1.0), (1, 1.0), (2, 0.0), (3, 2.0), (4, 2.0), (5, 1.0)), - Set((0, 1.0), (1, 1.0), (2, 2.0), (3, 0.0), (4, 0.0), (5, 1.0))) +val expected = Seq(Seq((0, 0.0), (1, 0.0), (2, 2.0), (3, 1.0), (4, 1.0), (5, 0.0)), --- End diff -- I confirmed this with @cloud-fan If use the pattern: ``` Seq(...).toDF().select(...).collect() ``` will use `localRelation` and will always use one partition to do computation. And the output row order will keep the same with the input seq. and seems many other testcases use similar way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r173997939 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1192,11 +1195,23 @@ class Analyzer( * @see https://issues.apache.org/jira/browse/SPARK-19737 */ object LookupFunctions extends Rule[LogicalPlan] { -override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions { - case f: UnresolvedFunction if !catalog.functionExists(f.name) => -withPosition(f) { - throw new NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName) -} +override def apply(plan: LogicalPlan): LogicalPlan = { + val catalogFunctionNameSet = new mutable.HashSet[FunctionIdentifier]() + plan.transformAllExpressions { +case f: UnresolvedFunction if catalogFunctionNameSet.contains(f.name) => f +case f: UnresolvedFunction if catalog.functionExists(f.name) => + catalogFunctionNameSet.add(normalizeFuncName(f.name)) + f +case f: UnresolvedFunction => + withPosition(f) { +throw new NoSuchFunctionException(f.name.database.getOrElse("default"), + f.name.funcName) + } + } +} + +private def normalizeFuncName(name: FunctionIdentifier): FunctionIdentifier = { + FunctionIdentifier(name.funcName.toLowerCase(Locale.ROOT), name.database) --- End diff -- `name.database.getOrElse("default")`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r173997911 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1192,11 +1195,23 @@ class Analyzer( * @see https://issues.apache.org/jira/browse/SPARK-19737 */ object LookupFunctions extends Rule[LogicalPlan] { -override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions { - case f: UnresolvedFunction if !catalog.functionExists(f.name) => -withPosition(f) { - throw new NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName) -} +override def apply(plan: LogicalPlan): LogicalPlan = { + val catalogFunctionNameSet = new mutable.HashSet[FunctionIdentifier]() + plan.transformAllExpressions { +case f: UnresolvedFunction if catalogFunctionNameSet.contains(f.name) => f --- End diff -- Normalize `FunctionIdentifier` when looking up it too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20795 **[Test build #88189 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88189/testReport)** for PR 20795 at commit [`211abcb`](https://github.com/apache/spark/commit/211abcb979787a22b76d05b47d2f21a98991f702). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20795 **[Test build #88188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88188/testReport)** for PR 20795 at commit [`99cc3b3`](https://github.com/apache/spark/commit/99cc3b394845d364f8e99de9ba136a2068fa76c6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88186/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20763 **[Test build #88186 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88186/testReport)** for PR 20763 at commit [`c0ac5ef`](https://github.com/apache/spark/commit/c0ac5ef3a1f00eee44dd50be925f983be852fe96). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r173995919 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -299,18 +310,17 @@ class StringIndexerSuite .setInputCol("label") .setOutputCol("labelIndex") -val expected = Seq(Set((0, 0.0), (1, 0.0), (2, 2.0), (3, 1.0), (4, 1.0), (5, 0.0)), - Set((0, 2.0), (1, 2.0), (2, 0.0), (3, 1.0), (4, 1.0), (5, 2.0)), - Set((0, 1.0), (1, 1.0), (2, 0.0), (3, 2.0), (4, 2.0), (5, 1.0)), - Set((0, 1.0), (1, 1.0), (2, 2.0), (3, 0.0), (4, 0.0), (5, 1.0))) +val expected = Seq(Seq((0, 0.0), (1, 0.0), (2, 2.0), (3, 1.0), (4, 1.0), (5, 0.0)), --- End diff -- I'd make a little argument that if without shuffling, dataframe transformation will keep row ordering. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20804: [SPARK-23656][Test] Perform assertions in XXH64Suite.tes...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20804 @hvanhovell could you please review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20742 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20742 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88184/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20742 **[Test build #88184 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88184/testReport)** for PR 20742 at commit [`832d871`](https://github.com/apache/spark/commit/832d87130ede866e4d877ca407e7a621282d4612). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16578 Please do the review @gengliangwang @jiangxb1987 . We should support this feature in Spark 2.4.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/20669 Hm, noted that we're making this tradeoff. We have an internal use case where we're pushing a custom logging properties file into the container using `spark.files`. Logging properties files need to be in the container before the JVM starts to configure the appenders from the get-go, but logging properties are more dynamic and probably don't belong in a statically built Docker image. We use YARN cluster mode primarily and rely on its file distribution, and we migrated to the fork's implementation of Kubernetes without having to change our internal setup. I think we can adapt to this change, but I don't think the use case I've described is as uncommon as one may think. There's plenty of lower-level tooling out there that requires the JVM to load files in static initializations. > Oh, btw, if you think that is a really, really important feature, you still don't need an init container for that. You can just run the dependency download tool before you run spark-submit in the driver container. Problem solved. Agreed. Init-containers are but one option to support this. The question was more if running spark-submit in client mode is completely sufficient, which it seems like it isn't in this specific case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20803 1. Double click this SQL statement can show full SQL statement: https://github.com/apache/spark/pull/6646 2. What if this SQL statement contains `--hiveconf` or `--hivevar`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20669 Oh, btw, if you think that is a really, really important feature, you still don't need an init container for that. You can just run the dependency download tool before you run spark-submit in the driver container. Problem solved. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20744: [SPARK-23608][CORE][WebUI] Add synchronization in SHS be...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20744 **[Test build #88187 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88187/testReport)** for PR 20744 at commit [`cd7e1f6`](https://github.com/apache/spark/commit/cd7e1f63e6f6614ed3efcc70df53cde41ffb6ff2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20669 That's one scenario where an init-container /might/ help. But be aware that your YARN example only works in a single one scenario - YARN cluster mode, and maybe even then may need some config customization. And it may not even work in some cases (e.g. YARN data directory mounted with `noexec`). YARN client, nor any other supported cluster manager, is able to do what you're saying. Personally I feel it's perfectly ok to require a custom docker image in these cases, since they're so uncommon (I've never seen one of our users use the yarn-cluster feature for this purpose). People can have a "main" Spark image and a "debug" one that can be easily chosen from when submitting the app. During this discussion I think someone mentioned that it might be possible to side-load init containers into Spark without this. I'm not that familiar with kubernetes, but if that's possible, it's another way you could achieve this without Spark having its own init container. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20744: [SPARK-23608][CORE][WebUI] Add synchronization in...
Github user zhouyejoe commented on a diff in the pull request: https://github.com/apache/spark/pull/20744#discussion_r173981826 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -148,14 +148,17 @@ class HistoryServer( appId: String, attemptId: Option[String], ui: SparkUI, - completed: Boolean) { + completed: Boolean): Unit = this.synchronized { --- End diff -- In order to fine grain the synchronization, can we add synchronize on handlers only? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20805: [SPARK-21479][SQL] Outer join filter pushdown in null su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20805 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88185/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20805: [SPARK-21479][SQL] Outer join filter pushdown in null su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20805 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org