Alexander created SPARK-22330: --------------------------------- Summary: Linear containsKey operation for serialized maps. Key: SPARK-22330 URL: https://issues.apache.org/jira/browse/SPARK-22330 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0, 1.2.1 Reporter: Alexander
One of our production application which aggressively uses cached spark RDDs degraded after increasing volumes of data though it shouldn't. Fast profiling session showed that the slowest part was SerializableMapWrapper#containsKey: it delegates get and remove to actual implementation, but containsKey is inherited from AbstractMap which is implemented in linear time via iteration over whole keySet. A workaround was simple: replacing all containsKey with get(key) != null solved the issue. Nevertheless, it would be much simpler for everyone if the issue will be fixed once and for all. A fix is straightforward, delegate containsKey to actual implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org