[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12933 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12933 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66740/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12933 **[Test build #66740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66740/consoleFull)** for PR 12933 at commit [`e81ff73`](https://github.com/apache/spark/commit/e81ff73a53eeb7eed645f3042ae9da758a254179). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12933 **[Test build #66740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66740/consoleFull)** for PR 12933 at commit [`e81ff73`](https://github.com/apache/spark/commit/e81ff73a53eeb7eed645f3042ae9da758a254179). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user hellertime commented on the issue: https://github.com/apache/spark/pull/12933 @tnachen rebased with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15432 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66738/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15432 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15432 **[Test build #66738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66738/consoleFull)** for PR 15432 at commit [`7fa7db2`](https://github.com/apache/spark/commit/7fa7db22dd4f2ba88ab1f09e4b776003b3f62fdb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15432 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15432 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66737/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15432 **[Test build #66737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66737/consoleFull)** for PR 15432 at commit [`860c177`](https://github.com/apache/spark/commit/860c1770c05b9a3d5ec49f1f166a901d2f9cca34). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15405: [SPARK-15917][CORE] Added support for number of executor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15405 **[Test build #3323 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3323/consoleFull)** for PR 15405 at commit [`bffedac`](https://github.com/apache/spark/commit/bffedac0756c98861f44dfa0967ec8477c63c4cc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python API for...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15430 cc @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15431 cc @felixcheung @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15431 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15431 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66736/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15431 **[Test build #66736 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66736/consoleFull)** for PR 15431 at commit [`44be8c9`](https://github.com/apache/spark/commit/44be8c95ef683610a31f9261e484fd82b480a430). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12064 **[Test build #66739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66739/consoleFull)** for PR 12064 at commit [`29841d0`](https://github.com/apache/spark/commit/29841d0efe8d0541f4faa10b986257c564670462). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12064 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15402: [SPARK-17835][ML][MLlib] Optimize NaiveBayes mllib wrapp...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15402 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12064 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66739/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15417: [SPARK-17851][SQL][MINOR][TESTS] Update invalid test sql...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/15417 Yes - there are several tens of test sql that fails checkAnalysis, should we update them all? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12064 **[Test build #66739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66739/consoleFull)** for PR 12064 at commit [`29841d0`](https://github.com/apache/spark/commit/29841d0efe8d0541f4faa10b986257c564670462). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15429: [SPARK-17840] [DOCS] Add some pointers for wiki/CONTRIBU...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15429 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66732/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15429: [SPARK-17840] [DOCS] Add some pointers for wiki/CONTRIBU...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15429 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15429: [SPARK-17840] [DOCS] Add some pointers for wiki/CONTRIBU...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15429 **[Test build #66732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66732/consoleFull)** for PR 15429 at commit [`29941b9`](https://github.com/apache/spark/commit/29941b9880719d410ba826f173739abd2091b463). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66730/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14847 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14847 **[Test build #66730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66730/consoleFull)** for PR 14847 at commit [`ccac04f`](https://github.com/apache/spark/commit/ccac04f1e788dfb278618a10ee9220c89df6a61d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14788 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14788 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66728/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14788 **[Test build #66728 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66728/consoleFull)** for PR 14788 at commit [`cd78330`](https://github.com/apache/spark/commit/cd783307fafa7987505906101e8e90148c589214). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15432 **[Test build #66738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66738/consoleFull)** for PR 15432 at commit [`7fa7db2`](https://github.com/apache/spark/commit/7fa7db22dd4f2ba88ab1f09e4b776003b3f62fdb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15382: [SPARK-17810] [SQL] Default spark.sql.warehouse.dir is r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15382 **[Test build #3322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3322/consoleFull)** for PR 15382 at commit [`71f124a`](https://github.com/apache/spark/commit/71f124a94afb49f6a3ec6717f4110b5231403398). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66735/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15432 **[Test build #66737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66737/consoleFull)** for PR 15432 at commit [`860c177`](https://github.com/apache/spark/commit/860c1770c05b9a3d5ec49f1f166a901d2f9cca34). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15428 **[Test build #66735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66735/consoleFull)** for PR 15428 at commit [`5cd58b7`](https://github.com/apache/spark/commit/5cd58b776883d5185f2b050317d260b924ebef5e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/9 We should also just check save / load is backward compatible with older versions. It should be, but subtle things can sneak in so let's be careful about that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null as inpu...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/15432 [SPARK-17854][SQL] rand/randn allows null as input seed ## What changes were proposed in this pull request? This PR proposes `rand`/`randn` accept `null` as input. In this case, it treats the values as `0`. It seems MySQL also accepts this. ```sql mysql> select rand(0); +-+ | rand(0) | +-+ | 0.15522042769493574 | +-+ 1 row in set (0.00 sec) mysql> select rand(NULL); +-+ | rand(NULL) | +-+ | 0.15522042769493574 | +-+ 1 row in set (0.00 sec) ``` and also Hive does according to [HIVE-14694](https://issues.apache.org/jira/browse/HIVE-14694) ## How was this patch tested? Unit tests in `DataFrameSuite.scala` and `RandomSuite.scala`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-17854 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15432 commit 62542fc953f825e8a22eafe98fc85770434e859a Author: hyukjinkwon Date: 2016-10-11T09:21:18Z rand/randn allows null as input seed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82751668 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -446,6 +463,20 @@ private[ml] object DefaultParamsReader { val cls = Utils.classForName(metadata.className) cls.getMethod("read").invoke(null).asInstanceOf[MLReader[T]].load(path) } + + def loadAndSetInitialModel[M <: Model[M]]( + instance: HasInitialModel[M], metadata: Metadata, path: String, sc: SparkContext): Unit = { +implicit val format = DefaultFormats +// Try to load the initial model --- End diff -- What about just wrapping the load initial model in a `Try`? That way there is no need to rely on metadata. It will fail if the "initialModel" path doesn't exist. We can ignore that exception but throw (or log) if we encounter an error in the actual loading. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82753706 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -300,15 +301,23 @@ private[ml] object DefaultParamsWriter { paramMap: Option[JValue] = None): String = { val uid = instance.uid val cls = instance.getClass.getName -val params = instance.extractParamMap().toSeq.asInstanceOf[Seq[ParamPair[Any]]] +val params = instance.extractParamMap().toSeq + .filter(_.param.name != "initialModel").asInstanceOf[Seq[ParamPair[Any]]] --- End diff -- Is it not possible to check if the param is an instance of `Param[Model[_]]` rather than the name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82752375 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -137,18 +143,64 @@ class KMeansSuite extends SparkFunSuite with MLlibTestSparkContext with DefaultR assert(model.clusterCenters === model2.clusterCenters) } val kmeans = new KMeans() -testEstimatorAndModelReadWrite(kmeans, dataset, KMeansSuite.allParamSettings, checkModelData) +testEstimatorAndModelReadWrite(kmeans, dataset, KMeansSuite.allParamSettings, checkModelData, + Map("initialModel" -> (checkModelData _).asInstanceOf[(Any, Any) => Unit])) + } + + test("Initialize using a trained model") { +val kmeans = new KMeans().setK(k).setSeed(1).setMaxIter(1) +val oneIterModel = kmeans.fit(dataset) +val twoIterModel = kmeans.copy(ParamMap(ParamPair(kmeans.maxIter, 2))).fit(dataset) +val oneMoreIterModel = kmeans.setInitialModel(oneIterModel).fit(dataset) + +twoIterModel.clusterCenters.zip(oneMoreIterModel.clusterCenters) + .foreach { case (center1, center2) => assert(center1 ~== center2 absTol 1E-8) } + } + + test("Initialize using a model with wrong dimension of cluster centers") { +val kmeans = new KMeans().setK(k).setSeed(1).setMaxIter(1) + +val wrongDimModel = KMeansSuite.generateRandomKMeansModel(4, k) +val wrongDimModelThrown = intercept[IllegalArgumentException] { + kmeans.setInitialModel(wrongDimModel).fit(dataset) +} +assert(wrongDimModelThrown.getMessage.contains("mismatched dimension")) + } + + test("Infer K from an initial model if K is unset") { +val kmeans = new KMeans() +val testNewK = 10 +val randomModel = KMeansSuite.generateRandomKMeansModel(dim, testNewK) +assert(kmeans.setInitialModel(randomModel).getK === testNewK) + } + + test("Initialize using a model with wrong K if K is set") { +val kmeans = new KMeans().setK(k).setSeed(1).setMaxIter(1) + +val wrongKModel = KMeansSuite.generateRandomKMeansModel(3, k + 1) --- End diff -- `3` can be `dim` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82750575 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -333,13 +372,44 @@ class KMeans @Since("1.5.0") ( override def transformSchema(schema: StructType): StructType = { validateAndTransformSchema(schema) } + + @Since("2.1.0") + override def write: MLWriter = new KMeans.KMeansWriter(this) } @Since("1.6.0") object KMeans extends DefaultParamsReadable[KMeans] { + // TODO: [SPARK-17784]: Add a fromCenters method + @Since("1.6.0") override def load(path: String): KMeans = super.load(path) + + @Since("1.6.0") + override def read: MLReader[KMeans] = new KMeansReader + + /** [[MLWriter]] instance for [[KMeans]] */ + private[KMeans] class KMeansWriter(instance: KMeans) extends MLWriter { +override protected def saveImpl(path: String): Unit = { + DefaultParamsWriter.saveInitialModel(instance, path) + DefaultParamsWriter.saveMetadata(instance, path, sc) +} + } + + private class KMeansReader extends MLReader[KMeans] { + +/** Checked against metadata when loading model */ --- End diff -- nit - should this be "... when loading estimator"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82749390 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -303,6 +312,20 @@ class KMeans @Since("1.5.0") ( @Since("1.5.0") def setSeed(value: Long): this.type = set(seed, value) + /** @group setParam */ --- End diff -- Above, we may want to also log warning in `setK` if `initialModel` has already been set? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82749322 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -303,6 +312,20 @@ class KMeans @Since("1.5.0") ( @Since("1.5.0") def setSeed(value: Long): this.type = set(seed, value) + /** @group setParam */ + @Since("2.1.0") + def setInitialModel(value: KMeansModel): this.type = { +val kOfInitialModel = value.parentModel.clusterCenters.length +if (isSet(k)) { + require(kOfInitialModel == $(k), --- End diff -- As discussed elsewhere, at set time I think we can log a warning and let `initialModel` take precedence. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82749661 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -81,6 +81,13 @@ private[clustering] trait KMeansParams extends Params with HasMaxIter with HasFe def getInitSteps: Int = $(initSteps) /** + * Param for KMeansModel to use for warm start. --- End diff -- We need more detail here to document the behavior that `initialModel` param settings take precedence. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15431 **[Test build #66736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66736/consoleFull)** for PR 15431 at commit [`44be8c9`](https://github.com/apache/spark/commit/44be8c95ef683610a31f9261e484fd82b480a430). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12930: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12930 Move the new fix to #15431 which leverages ```RFormula forceIndexLabel```, more succinct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12930: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiv...
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/12930 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiv...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15431 [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes error when label is numeric type ## What changes were proposed in this pull request? Fix SparkR ```spark.naiveBayes``` error when response variable of dataset is numeric type. See details and how to reproduce this bug at [SPARK-15153](https://issues.apache.org/jira/browse/SPARK-15153). ## How was this patch tested? Add unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-15153-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15431.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15431 commit 44be8c95ef683610a31f9261e484fd82b480a430 Author: Yanbo Liang Date: 2016-10-11T09:17:42Z Fix SparkR spark.naiveBayes error when label is numeric type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82750631 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * }}} */ override def visitShowColumns(ctx: ShowColumnsContext): LogicalPlan = withOrigin(ctx) { -val table = visitTableIdentifier(ctx.tableIdentifier) - -val lookupTable = Option(ctx.db) match { - case None => table - case Some(db) if table.database.exists(_ != db) => -operationNotAllowed( - s"SHOW COLUMNS with conflicting databases: '$db' != '${table.database.get}'", - ctx) - case Some(db) => TableIdentifier(table.identifier, Some(db.getText)) -} -ShowColumnsCommand(lookupTable) +ShowColumnsCommand(Option(ctx.db).map(_.getText), visitTableIdentifier(ctx.tableIdentifier)) --- End diff -- Seems Hive doesn't allow specifying duplicate databases no matter they are the same or not. https://github.com/apache/hive/blob/21a0142f333fba231f2648db53a48dc41384ad72/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java#L2215 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82750409 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * }}} */ override def visitShowColumns(ctx: ShowColumnsContext): LogicalPlan = withOrigin(ctx) { -val table = visitTableIdentifier(ctx.tableIdentifier) - -val lookupTable = Option(ctx.db) match { - case None => table - case Some(db) if table.database.exists(_ != db) => -operationNotAllowed( - s"SHOW COLUMNS with conflicting databases: '$db' != '${table.database.get}'", - ctx) - case Some(db) => TableIdentifier(table.identifier, Some(db.getText)) -} -ShowColumnsCommand(lookupTable) +ShowColumnsCommand(Option(ctx.db).map(_.getText), visitTableIdentifier(ctx.tableIdentifier)) --- End diff -- BTW, what @dilipbiswal does in this is following previous behavior, do we want to break it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66733/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15428 **[Test build #66733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66733/consoleFull)** for PR 15428 at commit [`cd8113c`](https://github.com/apache/spark/commit/cd8113c456870dd89754d0d48bfe6d0931ad05bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15285 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15285 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66725/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15285 **[Test build #66725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66725/consoleFull)** for PR 15285 at commit [`7cc6935`](https://github.com/apache/spark/commit/7cc6935cfade55a54d866bf2431bb28ef2f2544a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python API for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15430 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python API for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15430 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66734/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python API for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15430 **[Test build #66734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66734/consoleFull)** for PR 15430 at commit [`e34610c`](https://github.com/apache/spark/commit/e34610c216c45d9500423e8c49425ca5a327c52e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15342: [SPARK-11560] [MLLIB] Optimize KMeans implementation / r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15342 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66729/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15342: [SPARK-11560] [MLLIB] Optimize KMeans implementation / r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15342 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15342: [SPARK-11560] [MLLIB] Optimize KMeans implementation / r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15342 **[Test build #66729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66729/consoleFull)** for PR 15342 at commit [`ba52582`](https://github.com/apache/spark/commit/ba52582a1313be9d9febe215fe6f21b2d9be239f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15388 Thanks! @rxin @cloud-fan @hvanhovell @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/9 I have a few high level questions on this: Params Why are we only setting` k` based on the `initialModel`? I had thought from previous discussion above (it was a while ago now) that it would set all parameters? Of course some will be ignored like the init settings, but I think the default expectation of using `setInitialModel` would be that all params are set. For example, let's say I train a model with various `maxIter` and `tol` params. Then I want to use that model later for warm-starting. I want the same settings, just start from the existing centroids. I have to remember to do `new KMeans().setInitialModel(model).setMaxIter(model.getMaxIter())...`. If there is a good argument against this I'm happy to hear it, but then we must document this behavior clearly. Saving initial model on Model What is the reasoning behind saving the `initialModel` on the `Model`? It makes sense for the `Estimator` - I may want to save my estimator, and when loading it of course I'd need the initial model to be loaded, if it was set, so that I can correctly fit my estimator. But once I've fit a model, why would I save two models? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15295 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66722/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15295 **[Test build #66722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66722/consoleFull)** for PR 15295 at commit [`0ad8815`](https://github.com/apache/spark/commit/0ad8815b9042aadefa506e1e106822aa4bee810f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15427 cc @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15427 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15428 **[Test build #66735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66735/consoleFull)** for PR 15428 at commit [`5cd58b7`](https://github.com/apache/spark/commit/5cd58b776883d5185f2b050317d260b924ebef5e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66724/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15427 **[Test build #66724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66724/consoleFull)** for PR 15427 at commit [`dd6405c`](https://github.com/apache/spark/commit/dd6405c003ea082b1c614f2efed4d1bcb2d6f5b9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82745403 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * }}} */ override def visitShowColumns(ctx: ShowColumnsContext): LogicalPlan = withOrigin(ctx) { -val table = visitTableIdentifier(ctx.tableIdentifier) - -val lookupTable = Option(ctx.db) match { - case None => table - case Some(db) if table.database.exists(_ != db) => -operationNotAllowed( - s"SHOW COLUMNS with conflicting databases: '$db' != '${table.database.get}'", - ctx) - case Some(db) => TableIdentifier(table.identifier, Some(db.getText)) -} -ShowColumnsCommand(lookupTable) +ShowColumnsCommand(Option(ctx.db).map(_.getText), visitTableIdentifier(ctx.tableIdentifier)) --- End diff -- Good point! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/9 I misread DB's meaning in my previous comment. I agree that the parameter settings of `initialModel`, if set, should take precedence. If it conflicts with an existing `k` then log a warning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13194: [SPARK-15402] [ML] [PySpark] PySpark ml.evaluation shoul...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/13194 Resolved merge conflicts, ping @holdenk @MLnick @srowen @jkbradley to take a look when you available. This is just add Python API and should be very straight forward to move ahead. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python API for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15430 **[Test build #66734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66734/consoleFull)** for PR 15430 at commit [`e34610c`](https://github.com/apache/spark/commit/e34610c216c45d9500423e8c49425ca5a327c52e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82744230 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * }}} */ override def visitShowColumns(ctx: ShowColumnsContext): LogicalPlan = withOrigin(ctx) { -val table = visitTableIdentifier(ctx.tableIdentifier) - -val lookupTable = Option(ctx.db) match { - case None => table - case Some(db) if table.database.exists(_ != db) => -operationNotAllowed( - s"SHOW COLUMNS with conflicting databases: '$db' != '${table.database.get}'", - ctx) - case Some(db) => TableIdentifier(table.identifier, Some(db.getText)) -} -ShowColumnsCommand(lookupTable) +ShowColumnsCommand(Option(ctx.db).map(_.getText), visitTableIdentifier(ctx.tableIdentifier)) --- End diff -- FYI, MySQL will treat `SHOW COLUMNS FROM db1.tbl1 FROM db2` as `SHOW COLUMNS FROM tbl1 FROM db2`, i.e. if `FROM database` is specified, it will just ignore the database specified in table name, instead of reporting error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15405: [SPARK-15917][CORE] Added support for number of executor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15405 **[Test build #3323 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3323/consoleFull)** for PR 15405 at commit [`bffedac`](https://github.com/apache/spark/commit/bffedac0756c98861f44dfa0967ec8477c63c4cc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python ...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15430 [SPARK-15957][Follow-up][ML][PySpark] Add Python API for RFormula forceIndexLabel. ## What changes were proposed in this pull request? Follow-up work of #13675, add Python API for ```RFormula forceIndexLabel```. ## How was this patch tested? Unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-15957-python Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15430.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15430 commit e34610c216c45d9500423e8c49425ca5a327c52e Author: Yanbo Liang Date: 2016-10-11T08:17:11Z Add Python API for RFormula forceIndexLabel. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15072: [SPARK-17123][SQL] Use type-widened encoder for D...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/15072#discussion_r82743571 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -53,7 +53,15 @@ import org.apache.spark.util.Utils private[sql] object Dataset { def apply[T: Encoder](sparkSession: SparkSession, logicalPlan: LogicalPlan): Dataset[T] = { -new Dataset(sparkSession, logicalPlan, implicitly[Encoder[T]]) +val encoder = implicitly[Encoder[T]] +if (encoder.clsTag.runtimeClass == classOf[Row]) { + // We should use the encoder generated from the executed plan rather than the existing + // encoder for DataFrame because the types of columns can be varied due to widening types. + // See SPARK-17123. This is a bit hacky. Maybe we should find a better way to do this. + ofRows(sparkSession, logicalPlan).asInstanceOf[Dataset[T]] +} else { + new Dataset(sparkSession, logicalPlan, encoder) +} --- End diff -- Yeah, I forgot about type widening. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82743168 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala --- @@ -48,7 +48,8 @@ private[spark] trait ShuffleManager { handle: ShuffleHandle, startPartition: Int, endPartition: Int, - context: TaskContext): ShuffleReader[K, C] + context: TaskContext, + mapid: Int = -1): ShuffleReader[K, C] --- End diff -- `mapid` => `mapId` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15428: [SPARK-17219][ML] enchanced NaN value handling in...
Github user VinceShieh commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r82743072 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -73,15 +78,27 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String @Since("1.4.0") def setOutputCol(value: String): this.type = set(outputCol, value) + /** @group setParam */ + @Since("2.1.0") + def setHandleInvalid(value: String): this.type = set(handleInvalid, value) + setDefault(handleInvalid, "error") + @Since("2.0.0") override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema) -val bucketizer = udf { feature: Double => - Bucketizer.binarySearchForBuckets($(splits), feature) +val bucketizer: UserDefinedFunction = udf { (feature: Double, flag: String) => + Bucketizer.binarySearchForBuckets($(splits), feature, flag) +} +val filteredDataset = { --- End diff -- Nope, actually, NaN will trigger an error later in binarySearchForBuckets as an invalid feature value if no special handling is made. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82742902 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -687,18 +691,21 @@ private[spark] object MapOutputTracker extends Logging { shuffleId: Int, startPartition: Int, endPartition: Int, - statuses: Array[MapStatus]): Seq[(BlockManagerId, Seq[(BlockId, Long)])] = { + statuses: Array[MapStatus], + mapIdx: Int = -1): Seq[(BlockManagerId, Seq[(BlockId, Long)])] = { --- End diff -- How to change `mapIdx` to b`mapId` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14847 @viirya can you try to create a new operator for this optimization and make it work with whole-stage-codegen? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15388: [SPARK-17821][SQL] Support And and Or in Expressi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15388 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82742056 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -138,13 +138,16 @@ private[spark] abstract class MapOutputTracker(conf: SparkConf) extends Logging * and the second item is a sequence of (shuffle block id, shuffle block size) tuples * describing the shuffle blocks that are stored at that block manager. */ - def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, endPartition: Int) + def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, endPartition: Int, + mapid: Int = -1) --- End diff -- `mapid` => `mapId` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15388 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15428: [SPARK-17219][ML] enchanced NaN value handling in...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r82741203 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -270,10 +270,10 @@ private[ml] trait HasFitIntercept extends Params { private[ml] trait HasHandleInvalid extends Params { /** - * Param for how to handle invalid entries. Options are skip (which will filter out rows with bad values), or error (which will throw an error). More options may be added later. + * Param for how to handle invalid entries. Options are skip (which will filter out rows with bad values), or error (which will throw an error), or keep (which will keep the bad values in certain way). More options may be added later. --- End diff -- I'm neutral on the complexity that this adds, but not against it. It gets a little funny to say "keep invalid data" but I think we discussed that on the JIRA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15428: [SPARK-17219][ML] enchanced NaN value handling in...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r82741770 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -128,8 +145,9 @@ object Bucketizer extends DefaultParamsReadable[Bucketizer] { * Binary searching in several buckets to place each data point. * @throws SparkException if a feature is < splits.head or > splits.last */ - private[feature] def binarySearchForBuckets(splits: Array[Double], feature: Double): Double = { -if (feature.isNaN) { + private[feature] def binarySearchForBuckets + (splits: Array[Double], feature: Double, flag: String): Double = { --- End diff -- Nit: I think the convention is to leave the open paren on the previous line Doesn't this need to handle "skip" and "error"? throw an exception on NaN if "error" or ignore it if "skip"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15428: [SPARK-17219][ML] enchanced NaN value handling in...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r82741459 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -73,15 +78,27 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String @Since("1.4.0") def setOutputCol(value: String): this.type = set(outputCol, value) + /** @group setParam */ + @Since("2.1.0") + def setHandleInvalid(value: String): this.type = set(handleInvalid, value) + setDefault(handleInvalid, "error") + @Since("2.0.0") override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema) -val bucketizer = udf { feature: Double => - Bucketizer.binarySearchForBuckets($(splits), feature) +val bucketizer: UserDefinedFunction = udf { (feature: Double, flag: String) => + Bucketizer.binarySearchForBuckets($(splits), feature, flag) +} +val filteredDataset = { --- End diff -- Doesn't this need to try to handle "error"? ``` val filteredDataSet = getHandleInvalid match { case "skip" => dataset.na.drop case "keep" => dataset case "error" => if (...dataset contains NaN...) { throw new IllegalArgumentException(...) } else { dataset } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15428 **[Test build #66733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66733/consoleFull)** for PR 15428 at commit [`cd8113c`](https://github.com/apache/spark/commit/cd8113c456870dd89754d0d48bfe6d0931ad05bb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15429: [SPARK-17840] [DOCS] Add some pointers for wiki/CONTRIBU...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15429 **[Test build #66732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66732/consoleFull)** for PR 15429 at commit [`29941b9`](https://github.com/apache/spark/commit/29941b9880719d410ba826f173739abd2091b463). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15425: [SPARK-17816] [Core] [Branch-2.0] Fix ConcurrentModifica...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15425 **[Test build #3321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3321/consoleFull)** for PR 15425 at commit [`678ee6b`](https://github.com/apache/spark/commit/678ee6b1d6308a81a5c2d83a196144f29c80434b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14847 This would really only be interesting if it works with whole stage code gen; otherwise it is not really interesting. In addition, it'd make sense to have an explicit operator for this, e.g. StopAfter, so it is obvious in the explain plan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15429: [SPARK-17840] [DOCS] Add some pointers for wiki/C...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/15429 [SPARK-17840] [DOCS] Add some pointers for wiki/CONTRIBUTING.md in README.md and some warnings in PULL_REQUEST_TEMPLATE ## What changes were proposed in this pull request? Link to contributing wiki in PR template, README.md ## How was this patch tested? Doc-only change, tested by Jekyll You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-17840 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15429.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15429 commit 29941b9880719d410ba826f173739abd2091b463 Author: Sean Owen Date: 2016-10-11T07:51:16Z Link to contributing wiki in PR template, README.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66731/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org