Alexander created SPARK-22330:
---------------------------------

             Summary: Linear containsKey operation for serialized maps.
                 Key: SPARK-22330
                 URL: https://issues.apache.org/jira/browse/SPARK-22330
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.0, 1.2.1
            Reporter: Alexander


One of our production application which aggressively uses cached spark RDDs 
degraded after increasing volumes of data though it shouldn't. Fast profiling 
session showed that the slowest part was SerializableMapWrapper#containsKey: it 
delegates get and remove to actual implementation, but containsKey is inherited 
from AbstractMap which is implemented in linear time via iteration over whole 
keySet. A workaround was simple: replacing all containsKey with get(key) != 
null solved the issue.

Nevertheless, it would be much simpler for everyone if the issue will be fixed 
once and for all.
A fix is straightforward, delegate containsKey to actual implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to