[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grigory Domozhirov updated IGNITE-20610: ---------------------------------------- Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache<Integer, Long> cache = ignite.createCache("test"); IgniteDataStreamer<Integer, String> dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache<Integer, Long> cache = ignite.createCache("test"); IgniteDataStreamer<Integer, String> dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if {code:java}allowOverwrite == true{code} and same keys are added to `DataStreamer`. With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to `GridConcurrentHashSet` wrapped in a `DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache<Integer, Long> cache = ignite.createCache("test"); IgniteDataStreamer<Integer, String> dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache<Integer, Long> cache = ignite.createCache("test"); IgniteDataStreamer<Integer, String> dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 > DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys > -------------------------------------------------------------------------- > > Key: IGNITE-20610 > URL: https://issues.apache.org/jira/browse/IGNITE-20610 > Project: Ignite > Issue Type: Task > Components: streaming > Affects Versions: 2.15 > Reporter: Grigory Domozhirov > Priority: Minor > > While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data > streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 > method.) is clear it seems to work not as expected if allowOverwrite == true > and same keys are added to DataStreamer. > With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created > for the key object ( > [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] > ) and is added to GridConcurrentHashSet wrapped in a > DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with > identity check it ends up with `activeKeys` containing multiple objects with > equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. > > 1) Is that OK in general? > 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's > hashCode, the more often keys are repeated the lower is performance due to > hash collisions of non-equal objects. Here is an example: > {code:java} > try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { > try (IgniteCache<Integer, Long> cache = ignite.createCache("test"); > IgniteDataStreamer<Integer, String> dataStreamer = > ignite.dataStreamer(cache.getName()) > ) { > dataStreamer.allowOverwrite(true); // doesn't matter > long start = System.currentTimeMillis(); > for (int i = 0; i < 2_000_000; i++) { > dataStreamer.addData(i, ""); //unique keys > } > long elapsed = System.currentTimeMillis() - start; > System.out.println(elapsed); > } > } {code} > runs in 3970 ms. > {code:java} > try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { > try (IgniteCache<Integer, Long> cache = ignite.createCache("test"); > IgniteDataStreamer<Integer, String> dataStreamer = > ignite.dataStreamer(cache.getName()) > ) { > dataStreamer.allowOverwrite(true); // doesn't matter > long start = System.currentTimeMillis(); > for (int i = 0; i < 2_000_000; i++) { > dataStreamer.addData(0, ""); //equal key > } > long elapsed = System.currentTimeMillis() - start; > System.out.println(elapsed); > } > } {code} > runs in 12736 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)