[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10257


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-164081063
  
Merging with master and branch-1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-164070919
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47600/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-164070917
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-164070895
  
**[Test build #2209 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2209/consoleFull)**
 for PR 10257 at commit 
[`0fb5e2b`](https://github.com/apache/spark/commit/0fb5e2b9880477501dc959f503fb10d142350ee9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-164070814
  
**[Test build #47600 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47600/consoleFull)**
 for PR 10257 at commit 
[`0fb5e2b`](https://github.com/apache/spark/commit/0fb5e2b9880477501dc959f503fb10d142350ee9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-164068223
  
**[Test build #47600 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47600/consoleFull)**
 for PR 10257 at commit 
[`0fb5e2b`](https://github.com/apache/spark/commit/0fb5e2b9880477501dc959f503fb10d142350ee9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-164067852
  
**[Test build #2209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2209/consoleFull)**
 for PR 10257 at commit 
[`0fb5e2b`](https://github.com/apache/spark/commit/0fb5e2b9880477501dc959f503fb10d142350ee9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-164067304
  
LGTM pending tests
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-164066490
  
@jkbradley thanks for the comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-164004821
  
@BenFradet  Thanks!  I agree you didn't have to write a full example, but 
it's nice that it explains it very clearly, so I'd keep it.

I just had small phrasing comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/10257#discussion_r47384289
  
--- Diff: docs/ml-features.md ---
@@ -459,6 +459,42 @@ column, we should get the following:
 "a" gets index `0` because it is the most frequent, followed by "c" with 
index `1` and "b" with
 index `2`.
 
+Additionaly, there are two strategies regarding how `StringIndexer` will 
handle
+unseen labels when you have set up a `StringIndexer` on a dataset which 
you want
--- End diff --

"set up" --> "fit"
"on a dataset which you want to reuse on another" --> "on one dataset and 
then use it to transform another dataset"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-10 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-163864443
  
That looks good to me, I don't think a full code example is necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-10 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-163859120
  
Pinging @holdenk and @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-163765447
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47545/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-163765444
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-163765316
  
**[Test build #47545 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47545/consoleFull)**
 for PR 10257 at commit 
[`8c293a5`](https://github.com/apache/spark/commit/8c293a5c93efc1bb196dcf3ac5b42d0827141caa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10257#issuecomment-163761482
  
**[Test build #47545 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47545/consoleFull)**
 for PR 10257 at commit 
[`8c293a5`](https://github.com/apache/spark/commit/8c293a5c93efc1bb196dcf3ac5b42d0827141caa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12217] [ML] Document invalid handling f...

2015-12-10 Thread BenFradet
GitHub user BenFradet opened a pull request:

https://github.com/apache/spark/pull/10257

[SPARK-12217] [ML] Document invalid handling for StringIndexer

Added a paragraph regarding StringIndexer#setHandleInvalid to the 
ml-features documentation.

I wonder if I should also add a snippet to the code example, input welcome.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BenFradet/spark SPARK-12217

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10257


commit 8c293a5c93efc1bb196dcf3ac5b42d0827141caa
Author: BenFradet 
Date:   2015-12-10T21:40:06Z

added a paragraph regarding StringIndexer#setHandleInvalid to the 
ml-features doc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org