[ https://issues.apache.org/jira/browse/SPARK-23469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen reassigned SPARK-23469: --------------------------------- Assignee: Huaxin Gao > HashingTF should use corrected MurmurHash3 implementation > --------------------------------------------------------- > > Key: SPARK-23469 > URL: https://issues.apache.org/jira/browse/SPARK-23469 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.4.0 > Reporter: Joseph K. Bradley > Assignee: Huaxin Gao > Priority: Major > Labels: release-notes > > [SPARK-23381] added a corrected MurmurHash3 implementation but left the old > implementation alone. In Spark 2.3 and earlier, HashingTF will use the old > implementation. (We should not backport a fix for HashingTF since it would > be a major change of behavior.) But we should correct HashingTF in Spark > 2.4; this JIRA is for tracking this fix. > * Update HashingTF to use new implementation of MurmurHash3 > * Ensure backwards compatibility for ML persistence by having HashingTF use > the old MurmurHash3 when a model from Spark 2.3 or earlier is loaded. We can > add a Param to allow this. > Also, HashingTF still calls into the old spark.mllib.feature.HashingTF, so I > recommend we first migrate the code to spark.ml: [SPARK-21748]. We can leave > spark.mllib alone and just fix MurmurHash3 in spark.ml. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org