[ https://issues.apache.org/jira/browse/SPARK-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yanbo Liang resolved SPARK-14659. --------------------------------- Resolution: Fixed Target Version/s: 2.3.0 > OneHotEncoder support drop first category alphabetically in the encoded > vector > ------------------------------------------------------------------------------- > > Key: SPARK-14659 > URL: https://issues.apache.org/jira/browse/SPARK-14659 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.2.0 > Reporter: Yanbo Liang > Assignee: Wayne Zhang > Fix For: 2.3.0 > > > R formula drop the first category alphabetically when encode string/category > feature. Spark RFormula use OneHotEncoder to encode string/category feature > into vector, but only supporting "dropLast" by string/category frequencies. > This will cause SparkR produce different models compared with native R. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org