[ https://issues.apache.org/jira/browse/SPARK-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264814#comment-15264814 ]
Joseph K. Bradley commented on SPARK-14659: ------------------------------------------- Changing target to 2.1 since the code freeze is upon us. > OneHotEncoder support drop first category alphabetically in the encoded > vector > ------------------------------------------------------------------------------- > > Key: SPARK-14659 > URL: https://issues.apache.org/jira/browse/SPARK-14659 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: Yanbo Liang > > R formula drop the first category alphabetically when encode string/category > feature. Spark RFormula use OneHotEncoder to encode string/category feature > into vector, but only supporting "dropLast" by string/category frequencies. > This will cause SparkR produce different models compared with native R. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org