Wayne Zhang created SPARK-20619: ----------------------------------- Summary: StringIndexer supports multiple ways of label ordering Key: SPARK-20619 URL: https://issues.apache.org/jira/browse/SPARK-20619 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.1.0 Reporter: Wayne Zhang
StringIndexer maps labels to numbers according to the descending order of label frequency. Other types of ordering (e.g., alphabetical) may be needed in feature ETL, for example, in one-hot encoding. Propose to support alphabetic order, and ascending order of label frequency. For example, add a parameter stringOrderType to control how string is ordered which supports four options: - 'freq_desc': descending order by label frequency (most frequent label assigned 0) - 'freq_asc': ascending order by label frequency (least frequent label assigned 0) - 'alphabet_desc': descending alphabetical order - 'alphabet_asc': ascending alphabetical order -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org