[GitHub] spark pull request #17874: [SPARK-20612][SQL] Throw exception when there is ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17874#discussion_r115135600 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1023,8 +1023,6 @@ class Analyzer( * clause. This rule detects such queries and adds the required attributes to the original * projection, so that they will be available during sorting. Another projection is added to * remove these attributes after sorting. - * - * The HAVING clause could also used a grouping columns that is not presented in the SELECT. --- End diff -- This is by design. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17874: [SPARK-20612][SQL] Throw exception when there is unresol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17874: [SPARK-20612][SQL] Throw exception when there is unresol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76538/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17874: [SPARK-20612][SQL] Throw exception when there is unresol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17874 **[Test build #76538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76538/testReport)** for PR 17874 at commit [`f19976a`](https://github.com/apache/spark/commit/f19976a7e0818f36768d339bdcd883b31197de7e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17885 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76535/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17885 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17885 **[Test build #76535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76535/testReport)** for PR 17885 at commit [`99414d7`](https://github.com/apache/spark/commit/99414d7ce352d7d4dd32a9ad4eda93c11d360cac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17077 Also cc @cloud-fan who is the original PR author who implemented bucketBy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17077 @zero323 Could you also update the [SQL document](http://spark.apache.org/docs/latest/sql-programming-guide.html)? https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17077#discussion_r115134650 --- Diff: python/pyspark/sql/readwriter.py --- @@ -563,6 +563,60 @@ def partitionBy(self, *cols): self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, cols)) return self +@since(2.3) +def bucketBy(self, numBuckets, *cols): +"""Buckets the output by the given columns on the file system. --- End diff -- Thank you for adding the wrapper. Yes. We should make the Python APIs consistent with Scala APIs, if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17882 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76533/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17882 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17882 **[Test build #76533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76533/testReport)** for PR 17882 at commit [`53d0c25`](https://github.com/apache/spark/commit/53d0c2551ef73dc843a53a088c5c7c835956f490). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76541/testReport)** for PR 17770 at commit [`d0a94f4`](https://github.com/apache/spark/commit/d0a94f417bbe22f081772b2518315b367093b81d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17865 Could you please check the documents we did in Scala APIs? It sounds like we forgot to update the Python function descriptions when we did the change in the Scala APIs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17865#discussion_r115134581 --- Diff: python/pyspark/sql/functions.py --- @@ -409,7 +432,7 @@ def isnan(col): @since(1.6) def isnull(col): -"""An expression that returns true iff the column is null. +"""An expression that returns true if the column is null. --- End diff -- `the column`? This is misleading. We should make the Python documents consistent with what we did in Scala. For example, `isNull` in Scala APIs is described as > Returns true if `expr` is null, or false otherwise. Ref: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L280 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17831: [SPARK-18777][PYTHON][SQL] Return UDF from udf.re...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17831 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17831: [SPARK-18777][PYTHON][SQL] Return UDF from udf.register
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17831 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17835: [SPARK-20557] [SQL] Support JDBC data type Time w...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17835 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17835: [SPARK-20557] [SQL] Support JDBC data type Time with Tim...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17835 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17887: [SPARK-20399][SQL][WIP] Add a config to fallback string ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17887 **[Test build #76540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76540/testReport)** for PR 17887 at commit [`d0b2c22`](https://github.com/apache/spark/commit/d0b2c2278ec7d10cc1ab998be489e6553a8dc193). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17736: [SPARK-20399][SQL] Can't use same regex pattern b...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/17736 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17887: [SPARK-20399][SQL][WIP] Add a config to fallback string ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17887 cc @dbtsai @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17887: [SPARK-20399][SQL][WIP] Add a config to fallback ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/17887 [SPARK-20399][SQL][WIP] Add a config to fallback string literal parsing consistent with old sql parser behavior ## What changes were proposed in this pull request? Follow the discussion in #17736, this patch adds a config to fallback to 1.6 string literal parsing. ## How was this patch tested? Jenkins tests. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 add-config-fallback-string-parsing Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17887.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17887 commit d0b2c2278ec7d10cc1ab998be489e6553a8dc193 Author: Liang-Chi Hsieh Date: 2017-04-19T01:49:47Z Add a config to fallback string literal parsing consistent with old sql parser behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17886 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17886 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76539/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17886 **[Test build #76539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76539/testReport)** for PR 17886 at commit [`995d9a8`](https://github.com/apache/spark/commit/995d9a864e68febbca7b9541815c5e42735ebd03). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17886 **[Test build #76539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76539/testReport)** for PR 17886 at commit [`995d9a8`](https://github.com/apache/spark/commit/995d9a864e68febbca7b9541815c5e42735ebd03). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17878 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17878 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76532/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17878 **[Test build #76532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76532/testReport)** for PR 17878 at commit [`d69c71a`](https://github.com/apache/spark/commit/d69c71a0dacabc47863d49815ee67dc0d5515e5a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17881: [SPARK-20621][deploy]Delete deprecated config parameter ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17881 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76531/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17886 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17886 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76537/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17881: [SPARK-20621][deploy]Delete deprecated config parameter ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17881 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17886 **[Test build #76537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76537/testReport)** for PR 17886 at commit [`4bf1443`](https://github.com/apache/spark/commit/4bf1443a65d23b4e470dc4f7c4e57ce34460f551). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17881: [SPARK-20621][deploy]Delete deprecated config parameter ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17881 **[Test build #76531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76531/testReport)** for PR 17881 at commit [`50b4f2d`](https://github.com/apache/spark/commit/50b4f2d2269f03f3650443405a28546843f98f53). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76536/testReport)** for PR 17770 at commit [`7debd76`](https://github.com/apache/spark/commit/7debd76a0d69758d394a881a932c6714120cc180). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17770 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76536/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15466: [SPARK-13983][SQL] HiveThriftServer2 can not get ...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/15466 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17874: [SPARK-20612][SQL][WIP] Throw exception when there is un...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17874 **[Test build #76538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76538/testReport)** for PR 17874 at commit [`f19976a`](https://github.com/apache/spark/commit/f19976a7e0818f36768d339bdcd883b31197de7e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17880: [SPARK-20620][TEST]Add some unit tests into NullExpressi...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/17880 @gatorsmile thanks,l will do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not ge...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17886 **[Test build #76537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76537/testReport)** for PR 17886 at commit [`4bf1443`](https://github.com/apache/spark/commit/4bf1443a65d23b4e470dc4f7c4e57ce34460f551). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17874: [SPARK-20612][SQL][WIP] Throw exception when there is un...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17874 @cloud-fan This rule could make the query work: Seq(1).toDF("c1").createOrReplaceTempView("onerow") sql( """ | select 1 |from (select 1 from onerow t2 LIMIT 1) |where t2.c1=1""".stripMargin) But the where condition should not be able to refer `t2.c1` which is only available in the inner scope. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17886: [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/17886 [SPARK-13983][SQL][WIP] Fix HiveThriftServer2 can not get "--hiveconf" and ''--hivevar" variables since 2.x ## What changes were proposed in this pull request? Fix HiveThriftServer2 can not get "--hiveconf" and ''--hivevar" variables since 2.x ## How was this patch tested? manual tests and unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-13983-dev Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17886.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17886 commit 4bf1443a65d23b4e470dc4f7c4e57ce34460f551 Author: Yuming Wang Date: 2017-05-07T03:35:33Z Spark 2.x's HiveThriftServer2 support get "--hiveconf" and ''--hivevar" variables --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17884 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17884 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76534/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17884 **[Test build #76534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76534/testReport)** for PR 17884 at commit [`796a8e7`](https://github.com/apache/spark/commit/796a8e73fdfb986bedf443e69d228782f5e82fa8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for...
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17077#discussion_r115133626 --- Diff: python/pyspark/sql/readwriter.py --- @@ -563,6 +563,60 @@ def partitionBy(self, *cols): self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, cols)) return self +@since(2.3) +def bucketBy(self, numBuckets, *cols): +"""Buckets the output by the given columns on the file system. + +:param numBuckets: the number of buckets to save +:param cols: name of columns + +.. note:: Applicable for file-based data sources in combination with + :py:meth:`DataFrameWriter.saveAsTable`. + +>>> (df.write.format('parquet') +... .bucketBy(100, 'year', 'month') +... .mode("overwrite") +... .saveAsTable('bucketed_table')) +""" +if len(cols) == 1 and isinstance(cols[0], (list, tuple)): +cols = cols[0] + +if not isinstance(numBuckets, int): +raise TypeError("numBuckets should be an int, got {0}.".format(type(numBuckets))) + +if not all(isinstance(c, basestring) for c in cols): +raise TypeError("cols argument should be a string or a sequence of strings.") --- End diff -- Good point. We can support arbitrary `Iterable[str]` though. ```python if len(cols) == 1 and isinstance(cols[0], collections.abc.Iterable): cols = list(cols[0]) ``` Caveat is, we don't allow this anywhere else. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17770 **[Test build #76536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76536/testReport)** for PR 17770 at commit [`7debd76`](https://github.com/apache/spark/commit/7debd76a0d69758d394a881a932c6714120cc180). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17885 **[Test build #76535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76535/testReport)** for PR 17885 at commit [`99414d7`](https://github.com/apache/spark/commit/99414d7ce352d7d4dd32a9ad4eda93c11d360cac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17885 I'll target this for master, branch-2.2, branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17885: [SPARK-20627][PYSPARK] Drop the hadoop distirbuti...
GitHub user holdenk opened a pull request: https://github.com/apache/spark/pull/17885 [SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python version ## What changes were proposed in this pull request? Drop the hadoop distirbution name from the Python version (PEP440). ## How was this patch tested? Ran `make-distribution` locally You can merge this pull request into a Git repository by running: $ git pull https://github.com/holdenk/spark SPARK-20627-remove-pip-local-version-string Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17885.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17885 commit 4e30ba90a7f14627d098d676f1ee8bf02d62eb9e Author: Holden Karau Date: 2017-05-07T02:40:40Z Drop the hadoop distirbution name from the Python version packaging string commit 99414d7ce352d7d4dd32a9ad4eda93c11d360cac Author: Holden Karau Date: 2017-05-07T03:22:02Z Update comment since we don't have name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17884 **[Test build #76534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76534/testReport)** for PR 17884 at commit [`796a8e7`](https://github.com/apache/spark/commit/796a8e73fdfb986bedf443e69d228782f5e82fa8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17884 @felixcheung I ran a quick QA on the vignettes and fixed some additional typos and styles. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17697#discussion_r115133315 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctionsSuite.scala --- @@ -22,9 +22,13 @@ import org.apache.spark.mllib.rdd.MLPairRDDFunctions._ import org.apache.spark.mllib.util.MLlibTestSparkContext class MLPairRDDFunctionsSuite extends SparkFunSuite with MLlibTestSparkContext { + val source_array = Array( --- End diff -- Also, I think we use naming convention, `sourceArray`. Probably, just `data` is enough? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17697: [SPARK-20414][MLLIB] avoid creating only 16 reducers whe...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17697 I left some comments here though, I think I am not confident enough for a sign-off. Please let me defer to @srowen and @tejasapatil --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17697#discussion_r115133197 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctionsSuite.scala --- @@ -22,9 +22,13 @@ import org.apache.spark.mllib.rdd.MLPairRDDFunctions._ import org.apache.spark.mllib.util.MLlibTestSparkContext class MLPairRDDFunctionsSuite extends SparkFunSuite with MLlibTestSparkContext { + val source_array = Array( +(1, 7), (1, 3), (1, 6), (1, 1), (1, 2), (1, -1), +(3, 2), (3, 7), (5, 1), (3, 5) + ) --- End diff -- Indentation should be double-spaced here too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17697#discussion_r115133107 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala --- @@ -49,6 +53,7 @@ class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) extends Se } ).mapValues(_.toArray.sorted(ord.reverse)) // This is a min-heap, so we reverse the order. } + --- End diff -- It looks this newline can be removed if more commits should be pushed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17697#discussion_r115133103 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala --- @@ -40,7 +40,11 @@ class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) extends Se * @return an RDD that contains the top k values for each key */ def topByKey(num: Int)(implicit ord: Ordering[V]): RDD[(K, Array[V])] = { -self.aggregateByKey(new BoundedPriorityQueue[V](num)(ord))( + topByKey(num, 16) --- End diff -- To be clear, was this 16 by default before this PR? Adding a parameter would be fine but this should not change the original behaviour without this parameter before this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17697#discussion_r115133174 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala --- @@ -40,7 +40,11 @@ class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) extends Se * @return an RDD that contains the top k values for each key */ def topByKey(num: Int)(implicit ord: Ordering[V]): RDD[(K, Array[V])] = { -self.aggregateByKey(new BoundedPriorityQueue[V](num)(ord))( + topByKey(num, 16) --- End diff -- Also, I believe the indentation here should be double spaced. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17882: [WIP][SPARK-20079][try 2][yarn] Re registration of AM ha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17882 **[Test build #76533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76533/testReport)** for PR 17882 at commit [`53d0c25`](https://github.com/apache/spark/commit/53d0c2551ef73dc843a53a088c5c7c835956f490). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/17862 @hhbyyh can we smooth the hinge-loss using soft-max (variant of ReLU) and then use LBFGS ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17697: [SPARK-20414][MLLIB] avoid creating only 16 reducers whe...
Github user yangyangyyy commented on the issue: https://github.com/apache/spark/pull/17697 @HyukjinKwon @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17831: [SPARK-18777][PYTHON][SQL] Return UDF from udf.register
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17831 This change LGTM. I go to check #17848. It seems to me that the PR simply adds two flags into ScalaUDF. It appears that there is not API change regarding with existing UDF registration. I agreed with @holdenk and @HyukjinKwon that it is orthogonal to this change for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scala...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17801#discussion_r115132676 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2657,22 +2661,27 @@ object functions { /** * Converts time string in format -MM-dd HH:mm:ss to Unix timestamp (in seconds), - * using the default timezone and the default locale, return null if fail. + * using the default timezone and the default locale. + * Returns `null` if fails. + * * @group datetime_funcs * @since 1.5.0 */ def unix_timestamp(s: Column): Column = withExpr { UnixTimestamp(s.expr, Literal("-MM-dd HH:mm:ss")) } + // scalastyle:off line.size.limit /** - * Convert time string with given pattern - * (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) - * to Unix time stamp (in seconds), return null if fail. + * Converts time string with given pattern to Unix timestamp (in seconds). + * Returns `null` if fails. + * + * @see http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html";>Customizing Formats --- End diff -- that can avoid having scalastyle:off --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17878 Thanks, @HyukjinKwon AppVeyor looks good, waiting for Jenkins again (although, it has nothing to do with it..) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17878 **[Test build #76532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76532/testReport)** for PR 17878 at commit [`d69c71a`](https://github.com/apache/spark/commit/d69c71a0dacabc47863d49815ee67dc0d5515e5a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17881: [SPARK-20621][deploy]Delete deprecated config parameter ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17881 **[Test build #76531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76531/testReport)** for PR 17881 at commit [`50b4f2d`](https://github.com/apache/spark/commit/50b4f2d2269f03f3650443405a28546843f98f53). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17878: [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppV...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17878 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17881: [SPARK-20621][deploy]Delete deprecated config parameter ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17881 Jenkins, ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17884 @actuaryzhang thanks - would you have a chance to run a quick QA check on the rest of the vignettes, if you haven't already? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17884 This test seems flaky on AppVeyor, not sure why ``` Failed - 1. Error: spark.glm and predict (@test_mllib_regression.R#57) -- java.lang.IllegalStateException: SparkContext has been shutdown at org.apache.spark.SparkContext.runJob(SparkContext.scala:2015) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2044) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2063) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:333) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2923) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2237) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2237) at org.apache.spark.sql.Dataset$$anonfun$57.apply(Dataset.scala:2907) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2906) at org.apache.spark.sql.Dataset.head(Dataset.scala:2237) at org.apache.spark.sql.Dataset.head(Dataset.scala:2244) at org.apache.spark.sql.Dataset.first(Dataset.scala:2251) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...
Github user mariahualiu commented on the issue: https://github.com/apache/spark/pull/17854 Now I can comfortably use 2500 executors. But when I pushed the executor count to 3000, I saw a lot of heartbeat timeout errors. It is something else we can improve, probably another jira. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...
Github user mariahualiu commented on the issue: https://github.com/apache/spark/pull/17854 I re-ran the same application adding these configurations "--conf spark.yarn.scheduler.heartbeat.interval-ms=15000 --conf spark.yarn.launchContainer.count.simultaneously=50". Though it took 50 iterations to get 2500 containers from Yarn, it was faster to reach 2500 executors since there was much less executor failures and as a result little overhead of removing failed executors and less allocation requests to Yarn. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17298 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17298 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76530/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17298 **[Test build #76530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76530/testReport)** for PR 17298 at commit [`6c22a89`](https://github.com/apache/spark/commit/6c22a895f12a54f8b23f2cbb94c8bad3276a93bd). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...
Github user mariahualiu commented on the issue: https://github.com/apache/spark/pull/17854 Let me describe what I've seen when using 2500 executors. 1. In the first a few (2~3) requests, AM received all (in this case 2500) containers from Yarn. 2. In a few seconds, 2500 launch container commands were sent out. 3. It took 3~4 minutes to start an executor on an NM (most of the time was spent on container localization: downloading spark jar, application jar and etc. from the hdfs staging folder). 4. A large number of executors tried to retrieve spark properties from driver but failed to connect. A massive removing failed executors happened. It seems to me RemoveExecutor is handled by the same single thread that responds to RetrieveSparkProps and RegisterExecutor. As a result, this thread was even busier, and more executors cannot connect/register/etc. 5. YarnAllocator requested more containers to make up for the failed ones. More executors tried to retrieve spark props and register. However the thread was still overwhelmed by the previous round of executors and cannot respond. In some cases, we got 5000 executor failures and the application retried and eventually failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17644 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76529/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17644 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17644 **[Test build #76529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76529/testReport)** for PR 17644 at commit [`6e6e767`](https://github.com/apache/spark/commit/6e6e767c9a6787965d6eb9a32608aacacd543e23). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...
Github user mariahualiu commented on the issue: https://github.com/apache/spark/pull/17854 @squito yes, I capped the number of resources in updateResourceRequests so that YarnAllocator asks for less number of resources in each iteration. When allocation fails one iteration, the request is then added back and YarnAllocator will try to allocate the leftover (from the previous iteration) plus the new requests in the next iteration, which can result a lot of allocated containers. The second change, as you pointed out, is used to address this possibility. On a second thought, maybe it is a better solution to change AMRMClientImpl::allocate where it does not add all resource requests from ask to askList. @tgravescs I tried reducing spark.yarn.containerLauncherMaxThreads but it didn't help much. My understanding is that these threads send container launch commands to node managers and immediately return, which is very light weight and can be extremely fast. Launching container on NM side is an async operation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17884: [SparkR][Doc] fix typo in vignettes
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17884 @HyukjinKwon Thanks for pointing this out. I will keep this in mind next time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17801 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76528/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17801 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17801 **[Test build #76528 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76528/testReport)** for PR 17801 at commit [`5326ad1`](https://github.com/apache/spark/commit/5326ad1775d3c5467d4684bfa13cbece7cc92ac5). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaFPGrowthExample ` * `class SingularValueDecomposition(JavaModelWrapper):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17298: [SPARK-19094][WIP][PySpark] Plumb through logging for IJ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17298 **[Test build #76530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76530/testReport)** for PR 17298 at commit [`6c22a89`](https://github.com/apache/spark/commit/6c22a895f12a54f8b23f2cbb94c8bad3276a93bd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17451 Great, let me know if there is any questions @keypointt :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17831: [SPARK-18777][PYTHON][SQL] Return UDF from udf.register
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17831 @gatorsmile want to know if you're ok with this going into master or if you still have concerns about this if its targeted to 2.3? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/17077#discussion_r115129682 --- Diff: python/pyspark/sql/readwriter.py --- @@ -563,6 +563,60 @@ def partitionBy(self, *cols): self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, cols)) return self +@since(2.3) +def bucketBy(self, numBuckets, *cols): +"""Buckets the output by the given columns on the file system. --- End diff -- I'd copy the full description from DataFrameWriter here since comparing it to Hive could help people new to Spark understand what bucketBy does. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/17077#discussion_r115129876 --- Diff: python/pyspark/sql/readwriter.py --- @@ -563,6 +563,60 @@ def partitionBy(self, *cols): self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, cols)) return self +@since(2.3) +def bucketBy(self, numBuckets, *cols): +"""Buckets the output by the given columns on the file system. + +:param numBuckets: the number of buckets to save +:param cols: name of columns + +.. note:: Applicable for file-based data sources in combination with + :py:meth:`DataFrameWriter.saveAsTable`. + +>>> (df.write.format('parquet') +... .bucketBy(100, 'year', 'month') +... .mode("overwrite") +... .saveAsTable('bucketed_table')) +""" +if len(cols) == 1 and isinstance(cols[0], (list, tuple)): +cols = cols[0] + +if not isinstance(numBuckets, int): +raise TypeError("numBuckets should be an int, got {0}.".format(type(numBuckets))) + +if not all(isinstance(c, basestring) for c in cols): +raise TypeError("cols argument should be a string or a sequence of strings.") --- End diff -- So I don't think we really support all sequences (the above typecheck on L581 requires list or tuple but there are additional types of sequences). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/17077#discussion_r115129884 --- Diff: python/pyspark/sql/readwriter.py --- @@ -563,6 +563,60 @@ def partitionBy(self, *cols): self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, cols)) return self +@since(2.3) +def bucketBy(self, numBuckets, *cols): +"""Buckets the output by the given columns on the file system. + +:param numBuckets: the number of buckets to save +:param cols: name of columns + +.. note:: Applicable for file-based data sources in combination with + :py:meth:`DataFrameWriter.saveAsTable`. + +>>> (df.write.format('parquet') +... .bucketBy(100, 'year', 'month') +... .mode("overwrite") +... .saveAsTable('bucketed_table')) +""" +if len(cols) == 1 and isinstance(cols[0], (list, tuple)): +cols = cols[0] + +if not isinstance(numBuckets, int): +raise TypeError("numBuckets should be an int, got {0}.".format(type(numBuckets))) + +if not all(isinstance(c, basestring) for c in cols): +raise TypeError("cols argument should be a string or a sequence of strings.") + +col = cols[0] +cols = cols[1:] + +self._jwrite = self._jwrite.bucketBy(numBuckets, col, _to_seq(self._spark._sc, cols)) +return self + +@since(2.3) +def sortBy(self, *cols): +"""Sorts the output in each bucket by the given columns on the file system. + +:param cols: name of columns + +>>> (df.write.format('parquet') +... .bucketBy(100, 'year', 'month') +... .sortBy('day') +... .mode("overwrite") +... .saveAsTable('sorted_bucketed_table')) +""" +if len(cols) == 1 and isinstance(cols[0], (list, tuple)): +cols = cols[0] + +if not all(isinstance(c, basestring) for c in cols): +raise TypeError("cols argument should be a string or a sequence of strings.") --- End diff -- same note as above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/17849#discussion_r115129786 --- Diff: python/pyspark/ml/tests.py --- @@ -1355,7 +1370,7 @@ def test_java_params(self): for name, cls in inspect.getmembers(module, inspect.isclass): if not name.endswith('Model') and issubclass(cls, JavaParams)\ and not inspect.isabstract(cls): -self.check_params(cls()) +ParamTests.check_params(self, cls(), check_params_exist=False) --- End diff -- This might make sense to include as a comment in the code for whoever is coming to update this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/17849#discussion_r115129846 --- Diff: python/pyspark/ml/wrapper.py --- @@ -263,7 +282,14 @@ def _fit_java(self, dataset): def _fit(self, dataset): java_model = self._fit_java(dataset) -return self._create_model(java_model) +model = self._create_model(java_model) + +# SPARK-10931: This is a temporary fix to allow models to own params +# from estimators. Eventually, these params should be in models through +# using common base classes between estimators and models. +model._create_params_from_java() --- End diff -- So right now this would apply to all of the models, would it make sense to make it so that we can selectively move the params forward one at a time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17644 **[Test build #76529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76529/testReport)** for PR 17644 at commit [`6e6e767`](https://github.com/apache/spark/commit/6e6e767c9a6787965d6eb9a32608aacacd543e23). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/17644 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16966: [SPARK-18409][ML]LSH approxNearestNeighbors should use a...
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16966 @MLnick @jkbradley @sethah Could you take a review? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17092 @MLnick @jkbradley @sethah Could you take a review? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org