[ 
https://issues.apache.org/jira/browse/SPARK-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332587#comment-15332587
 ] 

Brett Stime commented on SPARK-3847:
------------------------------------

Another option: have the default behavior be 'safe' and not share hashCodes 
between JVMs. If passing hashCodes really does significantly improve 
performance (when used outside of arrays and enums), there could be a special 
configuration setting to enable inter-JVM hashCodes. E.g., something like 
spark.shuffle.i_solemnly_swear_my_keys_have_consistent_hashes which can be set 
true to enable the performant behavior. This would provide for discoverable 
documentation of the issue and make it relatively easy to compare/test results 
from either mode to the other.

> Enum.hashCode is only consistent within the same JVM
> ----------------------------------------------------
>
>                 Key: SPARK-3847
>                 URL: https://issues.apache.org/jira/browse/SPARK-3847
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>         Environment: Oracle JDK 7u51 64bit on Ubuntu 12.04
>            Reporter: Nathan Bijnens
>              Labels: enum
>
> When using java Enum's as key in some operations the results will be very 
> unexpected. The issue is that the Java Enum.hashCode returns the 
> memoryposition, which is different on each JVM. 
> {code}
> messages.filter(_.getHeader.getKind == Kind.EVENT).count
> >> 503650
> val tmp = messages.filter(_.getHeader.getKind == Kind.EVENT)
> tmp.map(_.getHeader.getKind).countByValue
> >> Map(EVENT -> 1389)
> {code}
> Because it's actually a JVM issue we either should reject with an error enums 
> as key or implement a workaround.
> A good writeup of the issue can be found here (and a workaround):
> http://dev.bizo.com/2014/02/beware-enums-in-spark.html
> Somewhat more on the hash codes and Enum's:
> https://stackoverflow.com/questions/4885095/what-is-the-reason-behind-enum-hashcode
> And some issues (most of them rejected) at the Oracle Bug Java database:
> - http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8050217
> - http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7190798



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to