[ 
https://issues.apache.org/jira/browse/SPARK-23469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-23469.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 25303
[https://github.com/apache/spark/pull/25303]

> HashingTF should use corrected MurmurHash3 implementation
> ---------------------------------------------------------
>
>                 Key: SPARK-23469
>                 URL: https://issues.apache.org/jira/browse/SPARK-23469
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: Joseph K. Bradley
>            Assignee: Huaxin Gao
>            Priority: Major
>              Labels: release-notes
>             Fix For: 3.0.0
>
>
> [SPARK-23381] added a corrected MurmurHash3 implementation but left the old 
> implementation alone.  In Spark 2.3 and earlier, HashingTF will use the old 
> implementation.  (We should not backport a fix for HashingTF since it would 
> be a major change of behavior.)  But we should correct HashingTF in Spark 
> 2.4; this JIRA is for tracking this fix.
> * Update HashingTF to use new implementation of MurmurHash3
> * Ensure backwards compatibility for ML persistence by having HashingTF use 
> the old MurmurHash3 when a model from Spark 2.3 or earlier is loaded.  We can 
> add a Param to allow this.
> Also, HashingTF still calls into the old spark.mllib.feature.HashingTF, so I 
> recommend we first migrate the code to spark.ml: [SPARK-21748].  We can leave 
> spark.mllib alone and just fix MurmurHash3 in spark.ml.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to