Grigory Domozhirov created IGNITE-20610: -------------------------------------------
Summary: DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys Key: IGNITE-20610 URL: https://issues.apache.org/jira/browse/IGNITE-20610 Project: Ignite Issue Type: Task Components: streaming Affects Versions: 2.15 Reporter: Grigory Domozhirov While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if `allowOverwrite == true` and same keys are added to `DataStreamer`. With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to `GridConcurrentHashSet` wrapped in a `DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache<Integer, Long> cache = ignite.createCache("test"); IgniteDataStreamer<Integer, String> dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache<Integer, Long> cache = ignite.createCache("test"); IgniteDataStreamer<Integer, String> dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 -- This message was sent by Atlassian Jira (v8.20.10#820010)