spark git commit: [SPARK-9977] [DOCS] Update documentation for StringIndexer
Repository: spark Updated Branches: refs/heads/branch-1.5 e56bcc638 -> 5553f02be [SPARK-9977] [DOCS] Update documentation for StringIndexer By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label. I think it is better to make it explicit on documentation. Author: lewuathe Closes #8205 from Lewuathe/SPARK-9977. (cherry picked from commit ba2a07e2b6c5a39597b64041cd5bf342ef9631f5) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5553f02b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5553f02b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5553f02b Branch: refs/heads/branch-1.5 Commit: 5553f02beb04c29d685049b460196b295ab4587b Parents: e56bcc6 Author: lewuathe Authored: Wed Aug 19 09:54:03 2015 +0100 Committer: Sean Owen Committed: Wed Aug 19 09:54:11 2015 +0100 -- docs/ml-features.md | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5553f02b/docs/ml-features.md -- diff --git a/docs/ml-features.md b/docs/ml-features.md index d82c85e..8d56dc3 100644 --- a/docs/ml-features.md +++ b/docs/ml-features.md @@ -725,7 +725,11 @@ dctDf.select("featuresDCT").show(3); `StringIndexer` encodes a string column of labels to a column of label indices. The indices are in `[0, numLabels)`, ordered by label frequencies. So the most frequent label gets index `0`. -If the input column is numeric, we cast it to string and index the string values. +If the input column is numeric, we cast it to string and index the string +values. When downstream pipeline components such as `Estimator` or +`Transformer` make use of this string-indexed label, you must set the input +column of the component to this string-indexed column name. In many cases, +you can set the input column with `setInputCol`. **Examples** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-9977] [DOCS] Update documentation for StringIndexer
Repository: spark Updated Branches: refs/heads/master 865a3df3d -> ba2a07e2b [SPARK-9977] [DOCS] Update documentation for StringIndexer By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label. I think it is better to make it explicit on documentation. Author: lewuathe Closes #8205 from Lewuathe/SPARK-9977. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ba2a07e2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ba2a07e2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ba2a07e2 Branch: refs/heads/master Commit: ba2a07e2b6c5a39597b64041cd5bf342ef9631f5 Parents: 865a3df Author: lewuathe Authored: Wed Aug 19 09:54:03 2015 +0100 Committer: Sean Owen Committed: Wed Aug 19 09:54:03 2015 +0100 -- docs/ml-features.md | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ba2a07e2/docs/ml-features.md -- diff --git a/docs/ml-features.md b/docs/ml-features.md index d82c85e..8d56dc3 100644 --- a/docs/ml-features.md +++ b/docs/ml-features.md @@ -725,7 +725,11 @@ dctDf.select("featuresDCT").show(3); `StringIndexer` encodes a string column of labels to a column of label indices. The indices are in `[0, numLabels)`, ordered by label frequencies. So the most frequent label gets index `0`. -If the input column is numeric, we cast it to string and index the string values. +If the input column is numeric, we cast it to string and index the string +values. When downstream pipeline components such as `Estimator` or +`Transformer` make use of this string-indexed label, you must set the input +column of the component to this string-indexed column name. In many cases, +you can set the input column with `setInputCol`. **Examples** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org