[jira] [Commented] (SPARK-14659) OneHotEncoder support drop first category alphabetically in the encoded vector

Wayne Zhang (JIRA) Wed, 18 Jan 2017 11:25:56 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828618#comment-15828618
 ]


Wayne Zhang commented on SPARK-14659:
-------------------------------------

[~yanboliang] [~josephkb]
Has anyone been working on this ticket? It will also be helpful to support 
'dropFirst', since in practice there is often need to set the most frequent as 
base for interpretability. I'll be happy to work on this (and already have some 
fix). 


> OneHotEncoder support drop first category alphabetically in the encoded 
> vector 
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-14659
>                 URL: https://issues.apache.org/jira/browse/SPARK-14659
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yanbo Liang
>
> R formula drop the first category alphabetically when encode string/category 
> feature. Spark RFormula use OneHotEncoder to encode string/category feature 
> into vector, but only supporting "dropLast" by string/category frequencies. 
> This will cause SparkR produce different models compared with native R.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14659) OneHotEncoder support drop first category alphabetically in the encoded vector

Reply via email to