[ https://issues.apache.org/jira/browse/SPARK-23643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758770#comment-17758770 ]
Mitesh commented on SPARK-23643: -------------------------------- +1 can we please document this in the ML migration guide? We use `Dataset.randomSplit()` to come up with test/train sets, so based on this hash change, all of our deterministic ML tests all fail (same input data, same seed/configs, same model...produces diff results). > XORShiftRandom.hashSeed allocates unnecessary memory > ---------------------------------------------------- > > Key: SPARK-23643 > URL: https://issues.apache.org/jira/browse/SPARK-23643 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.0 > Reporter: Max Gekk > Assignee: Max Gekk > Priority: Major > Fix For: 3.0.0 > > > The hashSeed method allocates 64 bytes buffer and puts only 8 bytes of the > seed parameter into it. Other bytes are always zero and could be easily > excluded from hash calculation. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org