[GitHub] spark pull request #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use...
Github user mgummelt commented on a diff in the pull request: https://github.com/apache/spark/pull/15654#discussion_r85799941 --- Diff: mesos/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala --- @@ -51,7 +52,7 @@ private[mesos] class MesosClusterDispatcher( extends Logging { private val publicAddress = Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse(args.host) - private val recoveryMode = conf.get("spark.deploy.recoveryMode", "NONE").toUpperCase() + private val recoveryMode = conf.get(RECOVERY_MODE).getOrElse("NONE").toUpperCase() --- End diff -- Shouldn't the "NONE" default be added to the config builder? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/15654 Thanks! One small fix then LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15628 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67819/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15628 **[Test build #67819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67819/consoleFull)** for PR 15628 at commit [`b5277c9`](https://github.com/apache/spark/commit/b5277c9bffef72b207f3d79f11b7bb01661de9e1). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15628 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15659 **[Test build #67814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67814/consoleFull)** for PR 15659 at commit [`e668af6`](https://github.com/apache/spark/commit/e668af63e9ee26a7d54f3a8092f32498ab287d67). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15693 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67811/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15693 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15693 **[Test build #67811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67811/consoleFull)** for PR 15693 at commit [`0b660e0`](https://github.com/apache/spark/commit/0b660e02480bb3d193daf4acc997c1c0ca040930). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13713: [SPARK-15994] [MESOS] Allow enabling Mesos fetch cache i...
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/13713 We need to get @srowen or one of the other committers to merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/15651 lgtm pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/1 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15626 **[Test build #67822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67822/consoleFull)** for PR 15626 at commit [`8a2028b`](https://github.com/apache/spark/commit/8a2028b34f5b9830a37161249ebcb306c65d49e1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15654 **[Test build #67823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67823/consoleFull)** for PR 15654 at commit [`bb74f52`](https://github.com/apache/spark/commit/bb74f521cc47bcf4ae099665b5c0aff2531155d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15651 **[Test build #67816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67816/consoleFull)** for PR 15651 at commit [`ffe4318`](https://github.com/apache/spark/commit/ffe43185f06f8b1aeffbf0c88fbc587aa8894bde). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15627: [SPARK-18099][YARN] Fail if same files added to distribu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15627 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67826/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15627: [SPARK-18099][YARN] Fail if same files added to distribu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15627 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15627: [SPARK-18099][YARN] Fail if same files added to distribu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15627 **[Test build #67826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67826/consoleFull)** for PR 15627 at commit [`f797481`](https://github.com/apache/spark/commit/f7974812a5cc76cf98bba1c70e739bbc770d7dde). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15651 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67816/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15651 **[Test build #67817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67817/consoleFull)** for PR 15651 at commit [`e0e38bf`](https://github.com/apache/spark/commit/e0e38bfc64760918295a368c56a8ffda40a889e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15651 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67817/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15651 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15698: [SPARK-18182] Expose ReplayListenerBus.read() ove...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/15698 [SPARK-18182] Expose ReplayListenerBus.read() overload which takes string iterator The `ReplayListenerBus.read()` method is used when implementing a custom `ApplicationHistoryProvider`. The current interface only exposes a `read()` method which takes an `InputStream` and performs stream-to-lines conversion itself, but it would also be useful to expose an overloaded method which accepts an iterator of strings, thereby enabling events to be provided from non-`InputStream` sources. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark replay-listener-bus-interface Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15698.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15698 commit b777ee5bb25f38086bfe2126be26de8f1e14a14d Author: Josh RosenDate: 2016-10-31T19:39:43Z Expose ReplayListenerBus.read() overload which accepts an iterator of lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15651 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15698: [SPARK-18182] Expose ReplayListenerBus.read() overload w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15698 **[Test build #67828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67828/consoleFull)** for PR 15698 at commit [`b777ee5`](https://github.com/apache/spark/commit/b777ee5bb25f38086bfe2126be26de8f1e14a14d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/11105 ping @squito / @rxin if either of you have some post-Spark Summit EU bandwidth to review this it would be awesome :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15659 **[Test build #67814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67814/consoleFull)** for PR 15659 at commit [`e668af6`](https://github.com/apache/spark/commit/e668af63e9ee26a7d54f3a8092f32498ab287d67). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67827/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15697 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67814/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15697 **[Test build #67827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67827/consoleFull)** for PR 15697 at commit [`a292ae8`](https://github.com/apache/spark/commit/a292ae8fc5bcb32e32c21d5e3ec7f093a4be13cd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15659 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 Are we planning to incorporate the Parquet 1.9 libraries into Spark 2.1? If so, then this PR should be unnecessary. Hopefully. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/15654 cc @srowen for merge into master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14803: [SPARK-17153][SQL] Should read partition data whe...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/14803#discussion_r85819725 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -608,6 +614,81 @@ class FileStreamSourceSuite extends FileStreamSourceTest { // === other tests + test("read new files in partitioned table without globbing, should read partition data") { +withTempDirs { case (dir, tmp) => + val partitionFooSubDir = new File(dir, "partition=foo") + val partitionBarSubDir = new File(dir, "partition=bar") + + val schema = new StructType().add("value", StringType).add("partition", StringType) + val fileStream = createFileStream("json", s"${dir.getCanonicalPath}", Some(schema)) + val filtered = fileStream.filter($"value" contains "keep") + testStream(filtered)( +// Create new partition=foo sub dir and write to it +AddTextFileData("{'value': 'drop1'}\n{'value': 'keep2'}", partitionFooSubDir, tmp), +CheckAnswer(("keep2", "foo")), + +// Append to same partition=foo sub dir +AddTextFileData("{'value': 'keep3'}", partitionFooSubDir, tmp), +CheckAnswer(("keep2", "foo"), ("keep3", "foo")), + +// Create new partition sub dir and write to it +AddTextFileData("{'value': 'keep4'}", partitionBarSubDir, tmp), +CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar")), + +// Append to same partition=bar sub dir +AddTextFileData("{'value': 'keep5'}", partitionBarSubDir, tmp), +CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar"), ("keep5", "bar")) + ) +} + } + + test("when schema inference is turned on, should read partition data") { +def createFile(content: String, src: File, tmp: File): Unit = { + val tempFile = Utils.tempFileWith(new File(tmp, "text")) + val finalFile = new File(src, tempFile.getName) + src.mkdirs() + require(stringToFile(tempFile, content).renameTo(finalFile)) +} + +withSQLConf(SQLConf.STREAMING_SCHEMA_INFERENCE.key -> "true") { + withTempDirs { case (dir, tmp) => +val partitionFooSubDir = new File(dir, "partition=foo") +val partitionBarSubDir = new File(dir, "partition=bar") + +// Create file in partition, so we can infer the schema. +createFile("{'value': 'drop0'}", partitionFooSubDir, tmp) + +val fileStream = createFileStream("json", s"${dir.getCanonicalPath}") +val filtered = fileStream.filter($"value" contains "keep") +testStream(filtered)( + // Append to same partition=foo sub dir + AddTextFileData("{'value': 'drop1'}\n{'value': 'keep2'}", partitionFooSubDir, tmp), + CheckAnswer(("keep2", "foo")), + + // Append to same partition=foo sub dir + AddTextFileData("{'value': 'keep3'}", partitionFooSubDir, tmp), + CheckAnswer(("keep2", "foo"), ("keep3", "foo")), + + // Create new partition sub dir and write to it + AddTextFileData("{'value': 'keep4'}", partitionBarSubDir, tmp), + CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar")), + + // Append to same partition=bar sub dir + AddTextFileData("{'value': 'keep5'}", partitionBarSubDir, tmp), + CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar"), ("keep5", "bar")), + + // Delete the two partition dirs + DeleteFile(partitionFooSubDir), --- End diff -- @viirya why need to delete dirs in this test? It's flaky since the source maybe is listing files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15626 **[Test build #67822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67822/consoleFull)** for PR 15626 at commit [`8a2028b`](https://github.com/apache/spark/commit/8a2028b34f5b9830a37161249ebcb306c65d49e1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15626 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67822/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15626 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15696 **[Test build #67818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67818/consoleFull)** for PR 15696 at commit [`2a61351`](https://github.com/apache/spark/commit/2a613516dd469bca5ed4d7b0f17f678e9e70e267). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class TaskCommitMessage(obj: Any) extends Serializable` * `abstract class FileCommitProtocol ` * `class MapReduceFileCommitterProtocol(committer: OutputCommitter) extends FileCommitProtocol ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67818/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/13881 Sorry for the long delay! Whenever you get a chance to update this, it'd be nice to log this info via the Instrumentation class, rather than logInfo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15659 **[Test build #67824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67824/consoleFull)** for PR 15659 at commit [`3bf961e`](https://github.com/apache/spark/commit/3bf961efbffc9b03eba7053348ac6ef1634d0ade). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #67825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67825/consoleFull)** for PR 11105 at commit [`8c560ca`](https://github.com/apache/spark/commit/8c560ca6dd8c28f86630ae42bb50739a8614bec3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15654 **[Test build #67823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67823/consoleFull)** for PR 15654 at commit [`bb74f52`](https://github.com/apache/spark/commit/bb74f521cc47bcf4ae099665b5c0aff2531155d2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15667: [SPARK-18107][SQL] Insert overwrite statement run...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/15667#discussion_r85808139 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -257,7 +258,31 @@ case class InsertIntoHiveTable( table.catalogTable.identifier.table, partitionSpec) +var doOverwrite = overwrite --- End diff -- nit: `doHiveOverwrite`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r85808919 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -418,21 +424,41 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } if (DDLUtils.isDatasourceTable(withStatsProps)) { - val oldDef = client.getTable(db, withStatsProps.identifier.table) - // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from the old table definition, - // to retain the spark specific format if it is. Also add old data source properties to table - // properties, to retain the data source table format. - val oldDataSourceProps = oldDef.properties.filter(_._1.startsWith(SPARK_SQL_PREFIX)) + val oldTableDef = client.getTable(db, withStatsProps.identifier.table) + + // Always update the location property w.r.t. the new table location. + val locationProp = tableDefinition.storage.locationUri.map { location => +TABLE_LOCATION -> location + } + // Only update the `locationUri` field if the location is really changed, because this table + // may be not Hive-compatible and can not set the `locationUri` field. We should respect the + // old `locationUri` even it's None. + val oldLocation = getLocationFromRawTable(oldTableDef) + val locationUri = if (oldLocation == tableDefinition.storage.locationUri) { --- End diff -- ```Scala test("alter table - rename") { val tabName = "tab1" val newTabName = "tab2" withTable(tabName, newTabName) { spark.range(10).write.saveAsTable(tabName) val catalog = spark.sessionState.catalog sql(s"ALTER TABLE $tabName RENAME TO $newTabName") sql(s"DESC FORMATTED $newTabName").show(100, false) assert(!catalog.tableExists(TableIdentifier(tabName))) assert(catalog.tableExists(TableIdentifier(newTabName))) } } ``` You can try to run the above test case in `DDLSuite.scala` and `HiveDDLSuite.scala`. The locations are different. One is using the new table name; another is using the old one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15695 **[Test build #67815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67815/consoleFull)** for PR 15695 at commit [`0d4461a`](https://github.com/apache/spark/commit/0d4461a9e444008a35cc04c607447dc3d4677b7f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15627: [SPARK-18099][YARN] Fail if same files added to distribu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15627 **[Test build #67826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67826/consoleFull)** for PR 15627 at commit [`f797481`](https://github.com/apache/spark/commit/f7974812a5cc76cf98bba1c70e739bbc770d7dde). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15697 **[Test build #67827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67827/consoleFull)** for PR 15697 at commit [`a292ae8`](https://github.com/apache/spark/commit/a292ae8fc5bcb32e32c21d5e3ec7f093a4be13cd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing e...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15637 How about the complex types? `Array`, `Map` and `Struct`? It sounds like the test cases do not cover these test cases. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15651 **[Test build #67813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67813/consoleFull)** for PR 15651 at commit [`5405a94`](https://github.com/apache/spark/commit/5405a949f3589b99d92dc5fa3f2fc264692910d1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15637: [SPARK-18000] [SQL] Aggregation function for comp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15637#discussion_r85814522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MapAggregate.scala --- @@ -0,0 +1,332 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import java.nio.ByteBuffer + +import scala.collection.immutable.TreeMap +import scala.collection.mutable + +import com.google.common.primitives.{Doubles, Ints, Longs} + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, TypeCheckSuccess} +import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionDescription} +import org.apache.spark.sql.catalyst.util.ArrayBasedMapData +import org.apache.spark.sql.types.{DataType, _} +import org.apache.spark.unsafe.types.UTF8String + +/** + * The MapAggregate function for a column returns: + * 1. null if no non-null value exists. + * 2. (distinct non-null value, frequency) pairs of equi-width histogram when the number of + * distinct non-null values is less than or equal to the specified maximum number of bins. + * 3. an empty map otherwise. + * + * @param child child expression that can produce column value with `child.eval(inputRow)` + * @param numBinsExpression The maximum number of bins. + */ +@ExpressionDescription( + usage = +""" + _FUNC_(col, numBins) - Returns 1. null if no non-null value exists. + 2. (distinct non-null value, frequency) pairs of equi-width histogram when the number of + distinct non-null values is less than or equal to the specified maximum number of bins. + 3. an empty map otherwise. +""") +case class MapAggregate( +child: Expression, +numBinsExpression: Expression, +override val mutableAggBufferOffset: Int, +override val inputAggBufferOffset: Int) extends TypedImperativeAggregate[MapDigest] { + + def this(child: Expression, numBinsExpression: Expression) = { +this(child, numBinsExpression, 0, 0) + } + + // Mark as lazy so that numBinsExpression is not evaluated during tree transformation. + private lazy val numBins: Int = numBinsExpression.eval().asInstanceOf[Int] + + override def inputTypes: Seq[AbstractDataType] = { +Seq(TypeCollection(NumericType, TimestampType, DateType, StringType), IntegerType) + } + + override def checkInputDataTypes(): TypeCheckResult = { +val defaultCheck = super.checkInputDataTypes() +if (defaultCheck.isFailure) { + defaultCheck +} else if (!numBinsExpression.foldable) { + TypeCheckFailure("The maximum number of bins provided must be a constant literal") +} else if (numBins < 2) { + TypeCheckFailure( +"The maximum number of bins provided must be a positive integer literal >= 2 " + + s"(current value = $numBins)") +} else { + TypeCheckSuccess +} + } + + override def update(buffer: MapDigest, input: InternalRow): Unit = { +if (buffer.isInvalid) { + return +} +val evaluated = child.eval(input) +if (evaluated != null) { + buffer.update(child.dataType, evaluated, numBins) +} --- End diff -- A general comment about the impl. Here, I think we should avoid `return` if possible. For example, we can re-write it like ```Scala if (!buffer.isInvalid) { val evaluated = child.eval(input) if (evaluated != null) { buffer.update(child.dataType, evaluated, numBins) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and
[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/11105#discussion_r85805218 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -722,6 +722,7 @@ private[spark] object JsonProtocol { val value = Utils.jsonOption(json \ "Value").map { v => accumValueFromJson(name, v) } val internal = (json \ "Internal").extractOpt[Boolean].getOrElse(false) val countFailedValues = (json \ "Count Failed Values").extractOpt[Boolean].getOrElse(false) +val dataProperty = (json \ "DataProperty").extractOpt[Boolean].getOrElse(false) --- End diff -- Done :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15695 **[Test build #67812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67812/consoleFull)** for PR 15695 at commit [`e2d2bac`](https://github.com/apache/spark/commit/e2d2bac560a529a2d22d8b1f55874edbeb4da0f1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15692: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate'...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15692 You'll need to add the Param itself. (Search for ```Params.dummy()``` in that file to find examples.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15695 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15695 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67812/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15633: [SPARK-18087] [SQL] Optimize insert to not requir...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/15633#discussion_r85806652 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -179,24 +180,30 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { "Cannot overwrite a path that is also being read from.") } + def refreshPartitionsCallback(updatedPartitions: Seq[TablePartitionSpec]): Unit = { +if (l.catalogTable.isDefined && --- End diff -- imo that is a little harder to read, since you have two anonymous function declarations instead of one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15654 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67823/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15654 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15654: [SPARK-16881][MESOS] Migrate Mesos configs to use Config...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15654 @mgummelt done! ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15667: [SPARK-18107][SQL] Insert overwrite statement run...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/15667#discussion_r85807729 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -257,7 +258,31 @@ case class InsertIntoHiveTable( table.catalogTable.identifier.table, partitionSpec) +var doOverwrite = overwrite + if (oldPart.isEmpty || !ifNotExists) { + // SPARK-18107: Insert overwrite runs much slower than hive-client. + // Newer Hive largely improves insert overwrite performance. As Spark uses older Hive + // version and we may not want to catch up new Hive version every time. We delete the + // Hive partition first and then load data file into the Hive partition. + if (oldPart.nonEmpty && overwrite) { +oldPart.get.storage.locationUri.map { uri => + val partitionPath = new Path(uri) + val fs = partitionPath.getFileSystem(hadoopConf) + if (fs.exists(partitionPath)) { +val pathPermission = fs.getFileStatus(partitionPath).getPermission() +if (!fs.delete(partitionPath, true)) { + throw new RuntimeException( +"Cannot remove partition directory '" + partitionPath.toString) +} else { + fs.mkdirs(partitionPath, pathPermission) --- End diff -- Is the mkdir necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15538: [SPARK-17993][SQL] Fix Parquet log output redirec...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/15538#discussion_r85809383 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -55,6 +56,21 @@ class ParquetFileFormat with DataSourceRegister with Logging with Serializable { + // Poor man's "static initializer". Scala doesn't have language support for static initializers, + // and it's important that we initialize `ParquetFileFormat.redirectParquetLogsViaSLF4J` before + // doing anything with the Parquet libraries. Rather than expect clients to initialize the + // `ParquetFileFormat` singleton object at the right time, we put that initialization in the + // constructor of this class. This method is idempotent, and essentially a no-op after its first + // call. + ParquetFileFormat.ensureParquetLogRedirection + + // Java serialization will not call the default constructor. Make sure we call + // ParquetFileFormat.ensureParquetLogRedirection in deserialization by implementing this hook + // method. + private def readObject(in: ObjectInputStream): Unit = { +in.defaultReadObject +ParquetFileFormat.ensureParquetLogRedirection --- End diff -- You could also call `ensureParquetLogRedirection` from some main class right? e.g. `class Executor`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15697: [SparkR][Test]:remove unnecessary suppressWarning...
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/15697 [SparkR][Test]:remove unnecessary suppressWarnings ## What changes were proposed in this pull request? In test_mllib.R, there are two unnecessary suppressWarnings. This PR just removes them. ## How was this patch tested? Existing unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangmiao1981/spark rtest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15697.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15697 commit a292ae8fc5bcb32e32c21d5e3ec7f093a4be13cd Author: wm...@hotmail.comDate: 2016-10-31T19:04:57Z remove suppressWarnings --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15695 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15695 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67815/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15637: [SPARK-18000] [SQL] Aggregation function for comp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15637#discussion_r85811315 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/MapAggregateQuerySuite.scala --- @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.sql.{Date, Timestamp} + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + + +class MapAggregateQuerySuite extends QueryTest with SharedSQLContext { + + private val table = "map_aggregate_test" + private val col1 = "col1" + private val col2 = "col2" + private val schema = StructType(Seq(StructField(col1, StringType), StructField(col2, DoubleType))) + + private def query(numBins: Int): DataFrame = { +sql(s"SELECT map_aggregate($col1, $numBins), map_aggregate($col2, $numBins) FROM $table") + } + + test("null handling") { +withTempView(table) { + // Null input + val nullRdd: RDD[Row] = spark.sparkContext.parallelize(Seq(Row(null, null))) + spark.createDataFrame(nullRdd, schema).createOrReplaceTempView(table) + checkAnswer(query(numBins = 2), Row(null, null)) + + // Empty input + val emptyRdd: RDD[Row] = spark.sparkContext.parallelize(Seq.empty) + spark.createDataFrame(emptyRdd, schema).createOrReplaceTempView(table) + checkAnswer(query(numBins = 2), Row(null, null)) + + // Add some non-null data + val rdd: RDD[Row] = spark.sparkContext.parallelize(Seq(Row(null, 3.0D), Row("a", null))) + spark.createDataFrame(rdd, schema).createOrReplaceTempView(table) + checkAnswer(query(numBins = 2), Row(Map(("a", 1)), Map((3.0D, 1 +} + } + + test("returns empty result when ndv exceeds numBins") { +withTempView(table) { + val rdd: RDD[Row] = spark.sparkContext.parallelize( +Seq(Row("a", 4.0D), Row("d", 2.0D), Row("c", 4.0D), Row("b", 1.0D), Row("a", 3.0D), + Row("a", 2.0D)), 2) + spark.createDataFrame(rdd, schema).createOrReplaceTempView(table) + checkAnswer(query(numBins = 4), Row( +Map(("a", 3), ("b", 1), ("c", 1), ("d", 1)), +Map((1.0D, 1), (2.0D, 2), (3.0D, 1), (4.0D, 2 + // One partial exceeds numBins during update() + checkAnswer(query(numBins = 2), Row(Map.empty, Map.empty)) + // Exceeding numBins during merge() + checkAnswer(query(numBins = 3), Row(Map.empty, Map.empty)) +} + } + + test("multiple columns of different types") { +def queryMultiColumns(numBins: Int): DataFrame = { + sql( +s""" + |SELECT + | map_aggregate(c1, $numBins), + | map_aggregate(c2, $numBins), + | map_aggregate(c3, $numBins), + | map_aggregate(c4, $numBins), + | map_aggregate(c5, $numBins), + | map_aggregate(c6, $numBins), + | map_aggregate(c7, $numBins), + | map_aggregate(c8, $numBins), + | map_aggregate(c9, $numBins), + | map_aggregate(c10, $numBins) + |FROM $table +""".stripMargin) +} + +val allTypeSchema = StructType(Seq( + StructField("c1", ByteType), + StructField("c2", ShortType), + StructField("c3", IntegerType), + StructField("c4", LongType), + StructField("c5", FloatType), + StructField("c6", DoubleType), + StructField("c7", DecimalType(10, 5)), + StructField("c8", DateType), + StructField("c9", TimestampType), + StructField("c10", StringType))) --- End diff -- Here, it still misses `BinaryType` and `BooleanType` --- If your project is set up for it, you can reply
[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15695 Looks like FileStreamSourceSuite is broken is 2.0. Looking at it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15651 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15651: [SPARK-17972][SQL] Add Dataset.checkpoint() to truncate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15651 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67813/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15637: [SPARK-18000] [SQL] Aggregation function for comp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15637#discussion_r85812423 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MapAggregate.scala --- @@ -0,0 +1,332 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import java.nio.ByteBuffer + +import scala.collection.immutable.TreeMap +import scala.collection.mutable + +import com.google.common.primitives.{Doubles, Ints, Longs} + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, TypeCheckSuccess} +import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionDescription} +import org.apache.spark.sql.catalyst.util.ArrayBasedMapData +import org.apache.spark.sql.types.{DataType, _} +import org.apache.spark.unsafe.types.UTF8String + +/** + * The MapAggregate function for a column returns: + * 1. null if no non-null value exists. + * 2. (distinct non-null value, frequency) pairs of equi-width histogram when the number of + * distinct non-null values is less than or equal to the specified maximum number of bins. + * 3. an empty map otherwise. + * + * @param child child expression that can produce column value with `child.eval(inputRow)` + * @param numBinsExpression The maximum number of bins. + */ +@ExpressionDescription( + usage = +""" + _FUNC_(col, numBins) - Returns 1. null if no non-null value exists. + 2. (distinct non-null value, frequency) pairs of equi-width histogram when the number of + distinct non-null values is less than or equal to the specified maximum number of bins. + 3. an empty map otherwise. --- End diff -- Describe the general description of this function at first, and then explains the return values? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org