[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14803 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14805: [MINOR][DOCS] Fix minor typos in python example code
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14805 **[Test build #64414 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64414/consoleFull)** for PR 14805 at commit [`92310d9`](https://github.com/apache/spark/commit/92310d91fa0c981d122a1a684a1dfd430f42db5e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14750 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14803 **[Test build #64405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64405/consoleFull)** for PR 14803 at commit [`2771d71`](https://github.com/apache/spark/commit/2771d71898f187d479cdb0996c96494c0b53a344). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14766: [SPARK-17197] [ML] [PySpark] PySpark LiR/LoR supports tr...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14766 Yes, thanks for review. Merged into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14750 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64409/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14750 **[Test build #64409 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64409/consoleFull)** for PR 14750 at commit [`6c9c130`](https://github.com/apache/spark/commit/6c9c1308051de27dae0fa147764399e5ebcff9f0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` should make...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14698 LGTM - merging to master/2.0. Thanks for working on this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14802: [SPARK-17235][SQL] Support purging of old logs in Metada...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14802 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64403/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14802: [SPARK-17235][SQL] Support purging of old logs in Metada...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14802 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14802: [SPARK-17235][SQL] Support purging of old logs in Metada...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14802 **[Test build #64403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64403/consoleFull)** for PR 14802 at commit [`0d9d1e6`](https://github.com/apache/spark/commit/0d9d1e6d59fb68996bf96b5238835a0718a8da1a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14805: [MINOR][DOCS] Fix minor typos in python example code
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14805 **[Test build #64414 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64414/consoleFull)** for PR 14805 at commit [`92310d9`](https://github.com/apache/spark/commit/92310d91fa0c981d122a1a684a1dfd430f42db5e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14805: [MINOR][DOCS] Fix minor typos in python example code
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14805 OK, can you perhaps quickly search for other instances of the same in Python code? it's worth a skim if you're up for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14805: [MINOR][DOCS] Fix minor typos in python example code
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14805 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14800 LGTM as a targeted fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14805: [MINOR][DOCS] Fix minor typos in python example code
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14805 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14766: [SPARK-17197] [ML] [PySpark] PySpark LiR/LoR supports tr...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14766 OK, so it's just exposing an existing parameter to python? seems OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14744 **[Test build #64413 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64413/consoleFull)** for PR 14744 at commit [`bb75190`](https://github.com/apache/spark/commit/bb751907ea0a04af1e6fbf3943ce57aa6c21552b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14805: [MINOR][DOCS] Fix minor typos in python example c...
GitHub user silentsokolov opened a pull request: https://github.com/apache/spark/pull/14805 [MINOR][DOCS] Fix minor typos in python example code ## What changes were proposed in this pull request? Fix minor typos python example code in streaming programming guide ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/silentsokolov/spark fix-typos Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14805.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14805 commit 92310d91fa0c981d122a1a684a1dfd430f42db5e Author: Dmitriy SokolovDate: 2016-08-25T08:57:03Z Fix minor typos in python example code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14766: [SPARK-17197] [ML] [PySpark] PySpark LiR/LoR supports tr...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14766 @srowen Would you mind to have a look at this one? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set s...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14744#discussion_r76204928 --- Diff: docs/configuration.md --- @@ -1752,6 +1752,14 @@ showDF(properties, numRows = 200, truncate = FALSE) Executable for executing R scripts in client modes for driver. Ignored in cluster modes. + + spark.r.shell.command + R + +Executable for executing sparkR shell in client modes for driver. Ignored in cluster modes. It is the same as environment variable SPARKR_DRIVER_R, but take precedence over it. +spark.r.shell.command is used for interactive mode of sparkR (sparkR shell) while spark.r.driver.command is used for the batch mode (running sparkR script). --- End diff -- Got it :smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14663: [SPARK-17001] [ML] Enable standardScaler to standardize ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14663 I'll go for this tomorrow if there are no other comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14804: [MINOR][Web UI] Correctly convert bytes in web UI
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14804 Meh, it's still ambiguous and there's a defined way to disambiguate, so it's unfortunate, but I'm OK with a step towards consistency in any event. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14800 **[Test build #64412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64412/consoleFull)** for PR 14800 at commit [`97dde82`](https://github.com/apache/spark/commit/97dde8292d04df37e0c96ce0a7198fe0da6403f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14637: [SPARK-16967] move mesos to module
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14637 Nice one, LGTM. I'll leave it open for final comments until tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14804: [MINOR][Web UI] Correctly convert bytes in web UI
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14804 I think [here](http://ux.stackexchange.com/questions/13815/files-size-units-kib-vs-kb-vs-kb) has a precise definition. AFAIK in Spark the conversion is 1024 based either KB, K, or kb, KiB is not so commonly used. And we usually treat everything as 1024 based, so it might not be so necessary to differentiate them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10896: [SPARK-12978][SQL] Skip unnecessary final group-by when ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/10896 @hvanhovell could you also give me comments on #13852? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14433: [SPARK-16829][SparkR]:sparkR sc.setLogLevel doesn't work
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14433 that's a good point actually - how about we use `args.primaryResource` or `args.isR` that already exists in SparkSubmit? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14786: [SPARK-17212][SQL] TypeCoercion supports widening conver...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14786 **[Test build #64411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64411/consoleFull)** for PR 14786 at commit [`ab754fa`](https://github.com/apache/spark/commit/ab754fa8eb537dcd6ce3f4f3b256f0fba2f2fdcd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14760: [SPARK-17193] [CORE] HadoopRDD NPE at DEBUG log l...
Github user srowen closed the pull request at: https://github.com/apache/spark/pull/14760 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14760: [SPARK-17193] [CORE] HadoopRDD NPE at DEBUG log level wh...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14760 Merged to master/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14796: [SPARK-17229][SQL] PostgresDialect shouldn't widen float...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14796 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64402/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14796: [SPARK-17229][SQL] PostgresDialect shouldn't widen float...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14796 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14796: [SPARK-17229][SQL] PostgresDialect shouldn't widen float...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14796 **[Test build #64402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64402/consoleFull)** for PR 14796 at commit [`708343d`](https://github.com/apache/spark/commit/708343d59e238322be25751960bf6e4dca47d98b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set s...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14744#discussion_r76203143 --- Diff: docs/configuration.md --- @@ -1752,6 +1752,14 @@ showDF(properties, numRows = 200, truncate = FALSE) Executable for executing R scripts in client modes for driver. Ignored in cluster modes. + + spark.r.shell.command + R + +Executable for executing sparkR shell in client modes for driver. Ignored in cluster modes. It is the same as environment variable SPARKR_DRIVER_R, but take precedence over it. +spark.r.shell.command is used for interactive mode of sparkR (sparkR shell) while spark.r.driver.command is used for the batch mode (running sparkR script). --- End diff -- I think what I mean is `spark.r.shell.command is used for interactive mode of SparkR while...` or `spark.r.shell.command is used for sparkR shell while...` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14804: [MINOR][Web UI] Correctly convert bytes in web UI
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14804 Ugh, yeah that's wrong in the sense that we are not showing MB, but MiB. I'd favor fixing the labels here and in Utils.bytesToString? Then again, I see that we will also parse input of "500kb" as if it's "500KiB", using 1024 not 1000. That's wrong too really. But fixing it means a bit of a behavior change. We can support "500KB" or "500kb" but it would now mean 500*1000 bytes not 500*1024. Well, maybe best to be consistently wrong than inconsistently wrong. Anyone feel at all inclined to take the hit in behavior change or just leave it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14801: [SPARK-17234] [SQL] Table Existence Checking when Index ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14801 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64404/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14801: [SPARK-17234] [SQL] Table Existence Checking when Index ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14801 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14801: [SPARK-17234] [SQL] Table Existence Checking when Index ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14801 **[Test build #64404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64404/consoleFull)** for PR 14801 at commit [`c400c52`](https://github.com/apache/spark/commit/c400c5292a32549cea80861adfaefeb41f4d90b3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SQLFeatureNotSupportedException(val feature: String)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14804: [MINOR][Web UI] Correctly convert bytes in web UI
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14804 Because in the log it shows Memory MB in 1024 based, while in the web UI it is 1000 based, so this is slightly different. You could check `Utils#bytesToString`. I think we unify this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/14800 No problem, thanks your attention :) okay, I'll remove this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14800 I am okay with both too. I apologise for the irrelevant comment @maropu . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14804: [MINOR][Web UI] Correctly convert bytes in web UI
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14804 KB = 1000 bytes, KiB = 1024 bytes. According to the suffixes we're using, 1000 is correct at the moment. Is the display inconsistent with something else in the UI or logs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/14800 yea, I see. I also have no strong opinion on this. So, both is okay to me. For now, I'll remove the requirement. What do u think? cc: @HyukjinKwon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14786: [SPARK-17212][SQL] TypeCoercion supports widening...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14786#discussion_r76200868 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -134,6 +134,8 @@ object TypeCoercion { Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => Some(DoubleType) +case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => --- End diff -- Ah, sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14798: [SPARK-17231][CORE] Avoid building debug or trace log me...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14798 Seems fine to me. I think you'd be welcome to fix up the other log messages you see in these files to use {} placeholders, but that's entirely optional. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14798: [SPARK-17231][CORE] Avoid building debug or trace...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14798#discussion_r76200529 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java --- @@ -29,7 +29,7 @@ import org.apache.spark.network.protocol.Message; import org.apache.spark.network.protocol.RequestMessage; import org.apache.spark.network.protocol.ResponseMessage; -import org.apache.spark.network.util.NettyUtils; +import static org.apache.spark.network.util.NettyUtils.getRemoteAddress; --- End diff -- Although I think we sort of avoid static imports as a rule, this seems like a good reason to use them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14800 True, but, in the with-replacement case, you're no longer selecting a subset to begin with, because an element can appear twice. "Sample" does generally mean "take a smaller set" but it also means things like "sampling from a distribution". I wouldn't feel strongly about it except that we're taking away behavior that worked fine. The RDD API for example, doesn't enforce that the rate must be <= 1 (even for without replacement, which is wrong). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14804: [MINOR][Web UI] Correctly convert bytes in web UI
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14804 **[Test build #64410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64410/consoleFull)** for PR 14804 at commit [`fe78ecc`](https://github.com/apache/spark/commit/fe78ecc2156ff6e842bd22b6b4419f0219a860b6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/8880 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64401/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/8880 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14804: [MINOR][Web UI] Correctly convert bytes in web UI
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/14804 [MINOR][Web UI] Correctly convert bytes in web UI ## What changes were proposed in this pull request? should be 1024 based, not 1000. ## How was this patch tested? Manually verified. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark correct-convert-bytes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14804.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14804 commit fe78ecc2156ff6e842bd22b6b4419f0219a860b6 Author: jerryshaoDate: 2016-08-25T08:01:12Z Correctly convert the bytes in UI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/8880 **[Test build #64401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64401/consoleFull)** for PR 8880 at commit [`beb4526`](https://github.com/apache/spark/commit/beb45266872cd52f2a64496056989237477305b6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 Not sure why these timings are so bad. Found out today that by using bytes and calling directly into Java's `org.apache.spark.api.r.RRDD` these can be improved by 2 orders of magnitude. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/14800 In the definition of statistic terms, Sampling is to select a `subset` of whole data So, I think the sample rate to be <= 1 is more reasonable. See: https://en.wikipedia.org/wiki/Sampling_(statistics) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14750 **[Test build #64409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64409/consoleFull)** for PR 14750 at commit [`6c9c130`](https://github.com/apache/spark/commit/6c9c1308051de27dae0fa147764399e5ebcff9f0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14800 @srowen Actually, we are already enforcing it to 100% when the replacement is disabled. So, I suggested this to match this up when it is enabled. Yes, it seems not related with the bug this PR is trying to fix. I apologise for the irrelevant comment. Ah, it is enforced into 100% when the replacement is disabled because there should be replacements when it exceeds. I see. I thought sampling is to have a representative smaller population from a larger one and therefore, it is not sensible when it exceeds 200%. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14802: [SPARK-17235][SQL] Support purging of old logs in Metada...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14802 Looks like this is a little similar to this one #13513 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/14579 I like it personally - if no one has a good reason why not it seems like a very reasonable approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14786: [SPARK-17212][SQL] TypeCoercion supports widening conver...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14786 MySQL and PostgreSQL support this **MySQL** - Greatest/leastest ```sql mysql> SELECT GREATEST(CAST("1990-02-24 12:00:00" AS DATETIME), CAST("1990-02-25" AS DATE)); +---+ | GREATEST(CAST("1990-02-24 12:00:00" AS DATETIME), CAST("1990-02-25" AS DATE)) | +---+ | 1990-02-25 00:00:00 | +---+ ``` - Union ```sql mysql> SELECT CAST("1990-02-24 12:00:00" AS DATETIME) UNION SELECT CAST("1990-02-24" AS DATE); +-+ | CAST("1990-02-24 12:00:00" AS DATETIME) | +-+ | 1990-02-24 12:00:00 | | 1990-02-24 00:00:00 | +-+ ``` **PostgreSQL** - Greatest/leatest ```sql postgres=# SELECT GREATEST(CAST('1990-02-24 12:00:00' AS TIMESTAMP), CAST('1990-02-25' AS DATE)); greatest - 1990-02-25 00:00:00 (1 row) ``` - Union ```sql postgres=# SELECT CAST('1990-02-24 12:00:00' AS TIMESTAMP) UNION SELECT CAST('1990-02-24' AS DATE); timestamp - 1990-02-24 00:00:00 1990-02-24 12:00:00 (2 rows) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14801: [SPARK-17234] [SQL] Table Existence Checking when Index ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14801 Can we avoid introducing new exception types? It is super annoying to match those in Python. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14800 Also, is it really necessary to limit the sample rate to be <= 1? It's not incoherent to want to sample 200% of a data set if it is with replacement. You'd just be generating a data set 2x the size drawn from the same empirical distribution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/8880 **[Test build #64408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64408/consoleFull)** for PR 8880 at commit [`9f958a4`](https://github.com/apache/spark/commit/9f958a4847af46de18befaede4d08093fe11416f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/8880 **[Test build #64407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64407/consoleFull)** for PR 8880 at commit [`167d474`](https://github.com/apache/spark/commit/167d47488d9f882ea3baca25e6d7b5656f71babb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14617 **[Test build #64406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64406/consoleFull)** for PR 14617 at commit [`838840d`](https://github.com/apache/spark/commit/838840dc3e40b8b10a111d343329f735e76fad36). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14800 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64400/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14800 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14800: [SPARK-15382][SQL] Fix a bug in sampling with replacemen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14800 **[Test build #64400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64400/consoleFull)** for PR 14800 at commit [`81c41d5`](https://github.com/apache/spark/commit/81c41d5c92dd503880fa8ff641743cce25e77514). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14617 @mallman I changed the UI based on your comment, here is the new one (separate the on heap and off heap memory usage in two columns): ![screen shot 2016-08-25 at 3 28 31 pm](https://cloud.githubusercontent.com/assets/850797/17960463/c64e32b0-6ad8-11e6-9afa-5f3c6bffa68e.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14803 **[Test build #64405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64405/consoleFull)** for PR 14803 at commit [`2771d71`](https://github.com/apache/spark/commit/2771d71898f187d479cdb0996c96494c0b53a344). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14803 cc @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14433: [SPARK-16829][SparkR]:sparkR sc.setLogLevel doesn't work
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14433 It feels like some overkill unless there are going to be more uses for changing logic based on whether it's running a shell. It seems not so bad to define `setRootLevel` in Scala as an alias when in the shell, or define something in SparkR, or just change the log message to note the two possibilities. Is there more need for this logic? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14803: [SPARK-17153][SQL] Should read partition data whe...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/14803 [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing ## What changes were proposed in this pull request? When reading file stream with non-globbing path, the results return data with all `null`s for the partitioned columns. E.g., case class A(id: Int, value: Int) val data = spark.createDataset(Seq( A(1, 1), A(2, 2), A(2, 3)) ) val url = "/tmp/test" data.write.partitionBy("id").parquet(url) spark.read.parquet(url).show +-+---+ |value| id| +-+---+ |2| 2| |3| 2| |1| 1| +-+---+ val s = spark.readStream.schema(spark.read.load(url).schema).parquet(url) s.writeStream.queryName("test").format("memory").start() sql("SELECT * FROM test").show +-++ |value| id| +-++ |2|null| |3|null| |1|null| +-++ ## How was this patch tested? Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 filestreamsource-option Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14803.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14803 commit 2771d71898f187d479cdb0996c96494c0b53a344 Author: Liang-Chi HsiehDate: 2016-08-25T07:13:20Z Pass path as basePath for partitionSpec creation if path is not globbing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML][WIP] Add initial model to kmeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64398/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML][WIP] Add initial model to kmeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML][WIP] Add initial model to kmeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9 **[Test build #64398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64398/consoleFull)** for PR 9 at commit [`c40192b`](https://github.com/apache/spark/commit/c40192b0579080f4af572cf6d12bf37942c03866). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14802: [SPARK-17235][SQL] Support purging of old logs in Metada...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14802 **[Test build #64403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64403/consoleFull)** for PR 14802 at commit [`0d9d1e6`](https://github.com/apache/spark/commit/0d9d1e6d59fb68996bf96b5238835a0718a8da1a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14801: [SPARK-17234] [SQL] Table Existence Checking when Index ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14801 **[Test build #64404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64404/consoleFull)** for PR 14801 at commit [`c400c52`](https://github.com/apache/spark/commit/c400c5292a32549cea80861adfaefeb41f4d90b3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14802: [SPARK-17235][SQL] Support purging of old logs in...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14802#discussion_r76191571 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala --- @@ -155,8 +174,8 @@ class HDFSMetadataLogSuite extends SparkFunSuite with SharedSQLContext { } } - - def testManager(basePath: Path, fm: FileManager): Unit = { + /** Basic test case for [[FileManager]] implementation. */ + private def testFileManager(basePath: Path, fm: FileManager): Unit = { --- End diff -- I renamed this because initially I thought it's a noun meaning "manager for testing", rather than "to test the file manager". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14802: [SPARK-17235][SQL] Support purging of old logs in...
GitHub user petermaxlee opened a pull request: https://github.com/apache/spark/pull/14802 [SPARK-17235][SQL] Support purging of old logs in MetadataLog ## What changes were proposed in this pull request? This patch adds a purge interface to MetadataLog, and an implementation in HDFSMetadataLog. The purge function is currently unused, but I will use it to purge old execution and file source logs in follow-up patches. These changes are required in a production structured streaming job that runs for a long period of time. ## How was this patch tested? Added a unit test case in HDFSMetadataLogSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/petermaxlee/spark SPARK-17235 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14802.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14802 commit 0d9d1e6d59fb68996bf96b5238835a0718a8da1a Author: petermaxleeDate: 2016-08-25T07:11:47Z [SPARK-17235][SQL] Support purging of old logs in MetadataLog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14802: [SPARK-17235][SQL] Support purging of old logs in Metada...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/14802 @tdas and @zsxwing can you take a look at this? It's a pretty simple change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14801: [SPARK-17234] [SQL] Table Existence Checking when...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/14801 [SPARK-17234] [SQL] Table Existence Checking when Index Table with the Same Name Exists ### What changes were proposed in this pull request? Hive Index tables are not supported by Spark SQL. Thus, we issue an exception when users try to access Hive Index tables. When the internal function `tableExists` tries to access Hive Index tables, it always gets the same error message: ```Hive index table is not supported```. This message could be confusing to users, since their SQL operations could be completely unrelated to Hive Index tables. For example, when users try to alter a table to a new name and there exists an index table with the same name, the expected exception should be a `TableAlreadyExistsException`. This PR made the following changes: - Introduced a new `AnalysisException` type: `SQLFeatureNotSupportedException`. When users try to access an `Index Table`, we will issue a `SQLFeatureNotSupportedException`. - `tableExists` returns `true` when hitting a `SQLFeatureNotSupportedException` and the feature is `Hive index table`. - Add a checking `requireTableNotExists` for `SessionCatalog`'s `createTable` API; otherwise, the current implementation relies on the Hive's internal checking. ### How was this patch tested? Added a test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark tableExists Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14801.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14801 commit 1af428b68c4341192bf8f66af7c434a7b89be61d Author: gatorsmileDate: 2016-08-25T06:26:00Z fix commit 664d6f1caa9b3d62eafbddb292991def722910ae Author: gatorsmile Date: 2016-08-25T06:34:16Z improve test cases commit c400c5292a32549cea80861adfaefeb41f4d90b3 Author: gatorsmile Date: 2016-08-25T07:12:57Z fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14746: [SPARK-17180] [SQL] Fix View Resolution Order in ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14746#discussion_r76190937 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -105,7 +105,13 @@ case class CreateViewCommand( } val sessionState = sparkSession.sessionState -if (isTemporary) { +// 1) CREATE VIEW: create a temp view when users explicitly specify the keyword TEMPORARY; +// otherwise, create a permanent view no matter whether the temporary view +// with the same name exists or not. +// 2) ALTER VIEW: alter the temporary view if the temp view exists; otherwise, try to alter --- End diff -- question: how can you tell whether it's CREATE VIEW or ALTER VIEW? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14753#discussion_r76189865 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TypedImperativeAggregateSuite.scala --- @@ -0,0 +1,300 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, DataOutputStream} + +import org.apache.spark.sql.TypedImperativeAggregateSuite.TypedMax +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{BoundReference, Expression, GenericMutableRow, SpecificMutableRow} +import org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate +import org.apache.spark.sql.execution.aggregate.SortAggregateExec +import org.apache.spark.sql.expressions.Window +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType, IntegerType, LongType} + +class TypedImperativeAggregateSuite extends QueryTest with SharedSQLContext { + + import testImplicits._ + + private val random = new java.util.Random() + + private val data = (0 until 1000).map { _ => +(random.nextInt(10), random.nextInt(100)) + } + + test("aggregate with object aggregate buffer") { +val agg = new TypedMax(BoundReference(0, IntegerType, nullable = false)) + +val group1 = (0 until data.length / 2) +val group1Buffer = agg.createAggregationBuffer() +group1.foreach { index => + val input = InternalRow(data(index)._1, data(index)._2) + agg.update(group1Buffer, input) +} + +val group2 = (data.length / 2 until data.length) +val group2Buffer = agg.createAggregationBuffer() +group2.foreach { index => + val input = InternalRow(data(index)._1, data(index)._2) + agg.update(group2Buffer, input) +} + +val mergeBuffer = agg.createAggregationBuffer() +agg.merge(mergeBuffer, group1Buffer) +agg.merge(mergeBuffer, group2Buffer) + +assert(mergeBuffer.value == data.map(_._1).max) +assert(agg.eval(mergeBuffer) == data.map(_._1).max) + +// Tests low level eval(row: InternalRow) API. +val row = new GenericMutableRow(Array(mergeBuffer): Array[Any]) + +// Evaluates directly on row consist of aggregation buffer object. +assert(agg.eval(row) == data.map(_._1).max) + } + + test("supports SpecificMutableRow as mutable row") { +val aggregationBufferSchema = Seq(IntegerType, LongType, BinaryType, IntegerType) +val aggBufferOffset = 2 +val buffer = new SpecificMutableRow(aggregationBufferSchema) +val agg = new TypedMax(BoundReference(ordinal = 1, dataType = IntegerType, nullable = false)) + .withNewMutableAggBufferOffset(aggBufferOffset) + +agg.initialize(buffer) +data.foreach { kv => + val input = InternalRow(kv._1, kv._2) + agg.update(buffer, input) +} +assert(agg.eval(buffer) == data.map(_._2).max) + } + + test("dataframe aggregate with object aggregate buffer, should not use HashAggregate") { +val df = data.toDF("a", "b") +val max = new TypedMax($"a".expr) + +// Always uses SortAggregateExec +val sparkPlan = df.select(Column(max.toAggregateExpression())).queryExecution.sparkPlan +assert(sparkPlan.isInstanceOf[SortAggregateExec]) + } + + test("dataframe aggregate with object aggregate buffer, no group by") { +val df = data.toDF("key", "value").coalesce(2) +val query = df.select(typedMax($"key"), count($"key"), typedMax($"value"), count($"value")) +val maxKey = data.map(_._1).max +val countKey = data.size +val maxValue = data.map(_._2).max +val countValue = data.size +val expected = Seq(Row(maxKey,
[GitHub] spark issue #13780: [SPARK-16063][SQL] Add storageLevel to Dataset
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/13780 ping @rxin @marmbrus @davies @gatorsmile for comment on the Python storage level issue I mention at https://github.com/apache/spark/pull/13780#discussion_r67833027 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r76189706 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala --- @@ -372,6 +373,40 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest { } } + test("SPARK-16948. Check empty orc tables in ORC") { --- End diff -- how about `support empty orc table when converting hive serde table to data source table` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/14579 @nchammas @holdenk @davies @rxin how about the approach of @MechCoder in https://github.com/apache/spark/pull/14579#discussion_r74813935? I think this will work well, so we could raise an error to prevent (almost all I think) usages outside of the intended pattern of `with some_rdd.cache() as x:` or `with some_rdd_already_cached as x:` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r76189474 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -54,10 +57,12 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable sparkSession: SparkSession, options: Map[String, String], files: Seq[FileStatus]): Option[StructType] = { -OrcFileOperator.readSchema( - files.map(_.getPath.toUri.toString), - Some(sparkSession.sessionState.newHadoopConf()) -) +// Safe to ignore FileNotFoundException in case no files are found. +val schema = Try(OrcFileOperator.readSchema( --- End diff -- @rajeshbalamohan is this change unnecessary for this PR? If so, I'd like to revert it to make the PR as small as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r76189247 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -237,21 +237,27 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log new Path(metastoreRelation.catalogTable.storage.locationUri.get), partitionSpec) -val inferredSchema = if (fileType.equals("parquet")) { - val inferredSchema = -defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()) - inferredSchema.map { inferred => -ParquetFileFormat.mergeMetastoreParquetSchema(metastoreSchema, inferred) - }.getOrElse(metastoreSchema) -} else { - defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()).get +val schema = fileType match { + case "parquet" => +val inferredSchema = + defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()) + +// For Parquet, get correct schema by merging Metastore schema data types --- End diff -- Do we have a test for this feature? I think we should make them consistent, i.e. parquet conversions should also use the metastore schema. cc @yhuai @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/ca...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/14579#discussion_r76189167 --- Diff: python/pyspark/rdd.py --- @@ -188,6 +188,12 @@ def __init__(self, jrdd, ctx, jrdd_deserializer=AutoBatchedSerializer(PickleSeri self._id = jrdd.id() self.partitioner = None +def __enter__(self): --- End diff -- hmmm, yes this does happen to work, because most operations boil down to something like `mapPartitions` which creates a new `PipelineRDD` which is not cached, or a new `RDD` which is again not cached. I think it will work for `DataFrame` too for similar reason - most operations return a new `DataFrame` instance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14537 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64399/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14537 **[Test build #64399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64399/consoleFull)** for PR 14537 at commit [`6ff7e5d`](https://github.com/apache/spark/commit/6ff7e5d50de530a71df5c4a4b220a8119ca3a3f6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` should make...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14698 Thanks @hvanhovell for the review! This patch has been updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14746: [SPARK-17180] [SQL] Fix View Resolution Order in ALTER V...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14746 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64396/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14746: [SPARK-17180] [SQL] Fix View Resolution Order in ALTER V...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14746 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14746: [SPARK-17180] [SQL] Fix View Resolution Order in ALTER V...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14746 **[Test build #64396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64396/consoleFull)** for PR 14746 at commit [`c5add2c`](https://github.com/apache/spark/commit/c5add2cbbcc3cbbce1ff09155da27b145c204ee1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14762: [SPARK-16962][CORE][SQL] Fix misaligned record accesses ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14762 Does your change have any performance impact? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14790: [SPARK-17215][SQL] Method `SQLContext.parseDataTy...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14790 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64397/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org