[
https://issues.apache.org/jira/browse/HIVE-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942069#comment-13942069
]
Prasanth J commented on HIVE-6707:
----------------------------------
getMapSize() api is broken as well. It does not reported the number of distinct
keys. I will fix that as well and will upload a new patch.
> Lazy maps are broken (LazyMap and LazyBinaryMap)
> ------------------------------------------------
>
> Key: HIVE-6707
> URL: https://issues.apache.org/jira/browse/HIVE-6707
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Affects Versions: 0.5.0, 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0,
> 0.12.0, 0.13.0
> Reporter: Prasanth J
> Assignee: Prasanth J
> Priority: Critical
> Labels: serde
> Fix For: 0.13.0, 0.14.0
>
> Attachments: HIVE-6707.1.patch
>
>
> LazyPrimitive and LazyBinaryPrimitive overrides hashcode method in HIVE-949.
> But it failed to override equals() method. As a result, LazyMap and
> LazyBinaryMap will end up having multiple values for the same key. Both
> LazyMap and LazyBinaryMap uses LinkedHashMap, so the expected behaviour is to
> have a single value per unique key.
> In the following code from LazyMap (LazyBinaryMap also has same code segment)
> {code}
> LazyPrimitive<?, ?> lazyKey = uncheckedGetKey(i);
> if (lazyKey == null) {
> continue;
> }
> Object key = lazyKey.getObject();
> if (key != null && !cachedMap.containsKey(key)) {
> {code}
> lazyKey.hashcode() returns the writable object's hashcode. The containsKeys()
> method of hash map
> (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java#366)
> checks if the hashcode are same, if so then it uses equals() method to
> verify if the key already exists. Since LazyPrimitive does not override
> equals() method it falls back to use Object equals(). Object equals() will
> return true only if both object are exactly the same (this == obj).
> So in the above code segment, even if the key already exists, the new value
> will be inserted with hash collision resulting in more number of map entries.
--
This message was sent by Atlassian JIRA
(v6.2#6252)