[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10257 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-164081063 Merging with master and branch-1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-164070919 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47600/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-164070917 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-164070895 **[Test build #2209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2209/consoleFull)** for PR 10257 at commit [`0fb5e2b`](https://github.com/apache/spark/commit/0fb5e2b9880477501dc959f503fb10d142350ee9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-164070814 **[Test build #47600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47600/consoleFull)** for PR 10257 at commit [`0fb5e2b`](https://github.com/apache/spark/commit/0fb5e2b9880477501dc959f503fb10d142350ee9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-164068223 **[Test build #47600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47600/consoleFull)** for PR 10257 at commit [`0fb5e2b`](https://github.com/apache/spark/commit/0fb5e2b9880477501dc959f503fb10d142350ee9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-164067852 **[Test build #2209 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2209/consoleFull)** for PR 10257 at commit [`0fb5e2b`](https://github.com/apache/spark/commit/0fb5e2b9880477501dc959f503fb10d142350ee9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-164067304 LGTM pending tests Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-164066490 @jkbradley thanks for the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-164004821 @BenFradet Thanks! I agree you didn't have to write a full example, but it's nice that it explains it very clearly, so I'd keep it. I just had small phrasing comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10257#discussion_r47384289 --- Diff: docs/ml-features.md --- @@ -459,6 +459,42 @@ column, we should get the following: "a" gets index `0` because it is the most frequent, followed by "c" with index `1` and "b" with index `2`. +Additionaly, there are two strategies regarding how `StringIndexer` will handle +unseen labels when you have set up a `StringIndexer` on a dataset which you want --- End diff -- "set up" --> "fit" "on a dataset which you want to reuse on another" --> "on one dataset and then use it to transform another dataset" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-163864443 That looks good to me, I don't think a full code example is necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-163859120 Pinging @holdenk and @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-163765447 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47545/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-163765444 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-163765316 **[Test build #47545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47545/consoleFull)** for PR 10257 at commit [`8c293a5`](https://github.com/apache/spark/commit/8c293a5c93efc1bb196dcf3ac5b42d0827141caa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10257#issuecomment-163761482 **[Test build #47545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47545/consoleFull)** for PR 10257 at commit [`8c293a5`](https://github.com/apache/spark/commit/8c293a5c93efc1bb196dcf3ac5b42d0827141caa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...
GitHub user BenFradet opened a pull request: https://github.com/apache/spark/pull/10257 [SPARK-12217] [ML] Document invalid handling for StringIndexer Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation. I wonder if I should also add a snippet to the code example, input welcome. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BenFradet/spark SPARK-12217 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10257.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10257 commit 8c293a5c93efc1bb196dcf3ac5b42d0827141caa Author: BenFradet Date: 2015-12-10T21:40:06Z added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org