Nathan Bijnens created SPARK-3847:
-------------------------------------

             Summary: Enum.hashCode is only consistent within the same JVM
                 Key: SPARK-3847
                 URL: https://issues.apache.org/jira/browse/SPARK-3847
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.1.0
         Environment: Oracle JDK 7u51 64bit on Ubuntu 12.04
            Reporter: Nathan Bijnens


When using java Enum's as key in some operations the results will be very 
unexpected. The issue is that the Java Enum.hashCode returns the 
memoryposition, which is different on each JVM. 

{code}
messages.filter(_.getHeader.getKind == Kind.EVENT).count
>> 503650

val tmp = messages.filter(_.getHeader.getKind == Kind.EVENT)
tmp.map(_.getHeader.getKind).countByValue
>> Map(EVENT -> 1389)
{code}

Because it's actually a JVM issue we either should reject with an error enums 
as key or implement a workaround.

A good writeup of the issue can be found here (and a workaround):
http://dev.bizo.com/2014/02/beware-enums-in-spark.html

Somewhat more on the hash codes and Enum's:
https://stackoverflow.com/questions/4885095/what-is-the-reason-behind-enum-hashcode

And some issues (most of them rejected) at the Oracle Bug Java database:
- http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8050217
- http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7190798




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to