[ https://issues.apache.org/jira/browse/SPARK-20619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wayne Zhang updated SPARK-20619: -------------------------------- Description: StringIndexer maps labels to numbers according to the descending order of label frequency. Other types of ordering (e.g., alphabetical) may be needed in feature ETL. For example, the ordering will affect the result in one-hot encoding and RFormula. Propose to support other ordering methods and we add a parameter stringOrderType that supports the following four options: - 'freq_desc': descending order by label frequency (most frequent label assigned 0) - 'freq_asc': ascending order by label frequency (least frequent label assigned 0) - 'alphabet_desc': descending alphabetical order - 'alphabet_asc': ascending alphabetical order was: StringIndexer maps labels to numbers according to the descending order of label frequency. Other types of ordering (e.g., alphabetical) may be needed in feature ETL, for example, in one-hot encoding. Propose to support alphabetic order, and ascending order of label frequency. For example, add a parameter stringOrderType to control how string is ordered which supports four options: - 'freq_desc': descending order by label frequency (most frequent label assigned 0) - 'freq_asc': ascending order by label frequency (least frequent label assigned 0) - 'alphabet_desc': descending alphabetical order - 'alphabet_asc': ascending alphabetical order > StringIndexer supports multiple ways of label ordering > ------------------------------------------------------ > > Key: SPARK-20619 > URL: https://issues.apache.org/jira/browse/SPARK-20619 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.1.0 > Reporter: Wayne Zhang > > StringIndexer maps labels to numbers according to the descending order of > label frequency. Other types of ordering (e.g., alphabetical) may be needed > in feature ETL. For example, the ordering will affect the result in one-hot > encoding and RFormula. Propose to support other ordering methods and we add a > parameter stringOrderType that supports the following four options: > - 'freq_desc': descending order by label frequency (most frequent label > assigned 0) > - 'freq_asc': ascending order by label frequency (least frequent label > assigned 0) > - 'alphabet_desc': descending alphabetical order > - 'alphabet_asc': ascending alphabetical order -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org