Repository: spark
Updated Branches:
  refs/heads/branch-2.3 54c1fae12 -> e58223171


[SPARK-23127][DOC] Update FeatureHasher guide for categoricalCols parameter

Update user guide entry for `FeatureHasher` to match the Scala / Python doc, to 
describe the `categoricalCols` parameter.

## How was this patch tested?

Doc only

Author: Nick Pentreath <ni...@za.ibm.com>

Closes #20293 from MLnick/SPARK-23127-catCol-userguide.

(cherry picked from commit 60203fca6a605ad158184e1e0ce5187e144a3ea7)
Signed-off-by: Nick Pentreath <ni...@za.ibm.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e5822317
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e5822317
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e5822317

Branch: refs/heads/branch-2.3
Commit: e58223171ecae6450482aadf4e7994c3b8d8a58d
Parents: 54c1fae
Author: Nick Pentreath <ni...@za.ibm.com>
Authored: Fri Jan 19 12:43:23 2018 +0200
Committer: Nick Pentreath <ni...@za.ibm.com>
Committed: Fri Jan 19 12:43:35 2018 +0200

----------------------------------------------------------------------
 docs/ml-features.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/e5822317/docs/ml-features.md
----------------------------------------------------------------------
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 7264313..10183c3 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -222,9 +222,9 @@ The `FeatureHasher` transformer operates on multiple 
columns. Each column may co
 numeric or categorical features. Behavior and handling of column data types is 
as follows:
 
 - Numeric columns: For numeric features, the hash value of the column name is 
used to map the
-feature value to its index in the feature vector. Numeric features are never 
treated as
-categorical, even when they are integers. You must explicitly convert numeric 
columns containing
-categorical features to strings first.
+feature value to its index in the feature vector. By default, numeric features 
are not treated
+as categorical (even when they are integers). To treat them as categorical, 
specify the relevant
+columns using the `categoricalCols` parameter.
 - String columns: For categorical features, the hash value of the string 
"column_name=value"
 is used to map to the vector index, with an indicator value of `1.0`. Thus, 
categorical features
 are "one-hot" encoded (similarly to using 
[OneHotEncoder](ml-features.html#onehotencoder) with


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to