spark git commit: [SPARK-9977] [DOCS] Update documentation for StringIndexer

2015-08-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 e56bcc638 -> 5553f02be


[SPARK-9977] [DOCS] Update documentation for StringIndexer

By using `StringIndexer`, we can obtain indexed label on new column. So a 
following estimator should use this new column through pipeline if it wants to 
use string indexed label.
I think it is better to make it explicit on documentation.

Author: lewuathe 

Closes #8205 from Lewuathe/SPARK-9977.

(cherry picked from commit ba2a07e2b6c5a39597b64041cd5bf342ef9631f5)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5553f02b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5553f02b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5553f02b

Branch: refs/heads/branch-1.5
Commit: 5553f02beb04c29d685049b460196b295ab4587b
Parents: e56bcc6
Author: lewuathe 
Authored: Wed Aug 19 09:54:03 2015 +0100
Committer: Sean Owen 
Committed: Wed Aug 19 09:54:11 2015 +0100

--
 docs/ml-features.md | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5553f02b/docs/ml-features.md
--
diff --git a/docs/ml-features.md b/docs/ml-features.md
index d82c85e..8d56dc3 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -725,7 +725,11 @@ dctDf.select("featuresDCT").show(3);
 `StringIndexer` encodes a string column of labels to a column of label indices.
 The indices are in `[0, numLabels)`, ordered by label frequencies.
 So the most frequent label gets index `0`.
-If the input column is numeric, we cast it to string and index the string 
values.
+If the input column is numeric, we cast it to string and index the string 
+values. When downstream pipeline components such as `Estimator` or 
+`Transformer` make use of this string-indexed label, you must set the input 
+column of the component to this string-indexed column name. In many cases, 
+you can set the input column with `setInputCol`.
 
 **Examples**
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-9977] [DOCS] Update documentation for StringIndexer

2015-08-19 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 865a3df3d -> ba2a07e2b


[SPARK-9977] [DOCS] Update documentation for StringIndexer

By using `StringIndexer`, we can obtain indexed label on new column. So a 
following estimator should use this new column through pipeline if it wants to 
use string indexed label.
I think it is better to make it explicit on documentation.

Author: lewuathe 

Closes #8205 from Lewuathe/SPARK-9977.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ba2a07e2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ba2a07e2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ba2a07e2

Branch: refs/heads/master
Commit: ba2a07e2b6c5a39597b64041cd5bf342ef9631f5
Parents: 865a3df
Author: lewuathe 
Authored: Wed Aug 19 09:54:03 2015 +0100
Committer: Sean Owen 
Committed: Wed Aug 19 09:54:03 2015 +0100

--
 docs/ml-features.md | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ba2a07e2/docs/ml-features.md
--
diff --git a/docs/ml-features.md b/docs/ml-features.md
index d82c85e..8d56dc3 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -725,7 +725,11 @@ dctDf.select("featuresDCT").show(3);
 `StringIndexer` encodes a string column of labels to a column of label indices.
 The indices are in `[0, numLabels)`, ordered by label frequencies.
 So the most frequent label gets index `0`.
-If the input column is numeric, we cast it to string and index the string 
values.
+If the input column is numeric, we cast it to string and index the string 
+values. When downstream pipeline components such as `Estimator` or 
+`Transformer` make use of this string-indexed label, you must set the input 
+column of the component to this string-indexed column name. In many cases, 
+you can set the input column with `setInputCol`.
 
 **Examples**
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org