[ https://issues.apache.org/jira/browse/SPARK-22801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Pentreath resolved SPARK-22801. ------------------------------------ Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19991 [https://github.com/apache/spark/pull/19991] > Allow FeatureHasher to specify numeric columns to treat as categorical > ---------------------------------------------------------------------- > > Key: SPARK-22801 > URL: https://issues.apache.org/jira/browse/SPARK-22801 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.3.0 > Reporter: Nick Pentreath > Assignee: Nick Pentreath > Fix For: 2.3.0 > > > {{FeatureHasher}} added in SPARK-13964 always treats numeric type columns as > numbers and never as categorical features. It is quite common to have > categorical features represented as numbers or codes (often say {{Int}}) in > data sources. > In order to hash these features as categorical, users must first explicitly > convert them to strings which is cumbersome. > Add a new param {{categoricalCols}} which specifies the numeric columns that > should be treated as categorical features. > *Note* while the reverse case is certainly possible (i.e. numeric features > that are encoded as strings and a user would like to treat them as numeric), > this is probably less likely and this case won't be supported at this time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org