Wayne Zhang created SPARK-20619:
-----------------------------------

             Summary: StringIndexer supports multiple ways of label ordering
                 Key: SPARK-20619
                 URL: https://issues.apache.org/jira/browse/SPARK-20619
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.1.0
            Reporter: Wayne Zhang


StringIndexer maps labels to numbers according to the descending order of label 
frequency. Other types of ordering (e.g., alphabetical) may be needed in 
feature ETL, for example, in one-hot encoding. Propose to support alphabetic 
order, and ascending order of label frequency. For example, add a parameter 
stringOrderType to control how string is ordered which supports four options:

   - 'freq_desc': descending order by label frequency (most frequent label 
assigned 0)
   - 'freq_asc': ascending order by label frequency (least frequent label 
assigned 0)
   - 'alphabet_desc': descending alphabetical order
   - 'alphabet_asc': ascending alphabetical order



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to