[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created ([code|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]) for the key object and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper ([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } System.out.println(System.currentTimeMillis() - start); }{code} runs in 29025 ms. was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created ([code|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]]) for the key object and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper ([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key }
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created ([code|https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]) for the key object and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper ([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } System.out.println(System.currentTimeMillis() - start); }{code} runs in 29025 ms. was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created ([code|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]) for the key object and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper ([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key }
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created ([code|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]]) for the key object and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper ([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } System.out.println(System.currentTimeMillis() - start); }{code} runs in 29025 ms. was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() [is created|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]] for the key object and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper ([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key }
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() [is created|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]] for the key object and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper ([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } System.out.println(System.currentTimeMillis() - start); }{code} runs in 29025 ms. was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper (code) ( [https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L728C12-L728C12] ). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key }
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper (code) ( [https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L728C12-L728C12] ). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } System.out.println(System.currentTimeMillis() - start); }{code} runs in 29025 ms. was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper ( https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L728C12-L728C12 ). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key }
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper ( https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L728C12-L728C12 ). Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } System.out.println(System.currentTimeMillis() - start); }{code} runs in 29025 ms. was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } System.out.println(System.currentTimeMillis() - start); }{code} runs in 29025 ms. > DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys >
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } System.out.println(System.currentTimeMillis() - start); }{code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration()); IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } System.out.println(System.currentTimeMillis() - start); }{code} runs in 29025 ms. was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 29025 ms. > DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys > -- > > Key:
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is a corner case: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 29025 ms. was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 29025 ms. >
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 6029 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 5_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 29025 ms. was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 ms. > DataStreamerImpl.KeyCacheObjectWrapper
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 ms. was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 > DataStreamerImpl.KeyCacheObjectWrapper low
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 > DataStreamerImpl.KeyCacheObjectWrapper low
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower performance is due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal? 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 > DataStreamerImpl.KeyCacheObjectWrapper low
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to a DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense as all its entries are always non-equal 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 > DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys > -- > >
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to DataStreamer. With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to GridConcurrentHashSet wrapped in a DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if {code:java}allowOverwrite == true{code} and same keys are added to `DataStreamer`. With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to `GridConcurrentHashSet` wrapped in a `DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 > DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys > -- > > Key: IGNITE-20610 > URL: https://issues.apache.org/jira/browse/IGNITE-20610
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to `DataStreamer`. With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to `GridConcurrentHashSet` wrapped in a `DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if `allowOverwrite == true` and same keys are added to `DataStreamer`. With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to `GridConcurrentHashSet` wrapped in a `DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 > DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys > -- > > Key: IGNITE-20610 > URL: https://issues.apache.org/jira/browse/IGNITE-20610 >
[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
[ https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grigory Domozhirov updated IGNITE-20610: Description: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if {code:java}allowOverwrite == true{code} and same keys are added to `DataStreamer`. With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to `GridConcurrentHashSet` wrapped in a `DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 was: While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 method.) is clear it seems to work not as expected if allowOverwrite == true and same keys are added to `DataStreamer`. With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created for the key object ( [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316] ) and is added to `GridConcurrentHashSet` wrapped in a `DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with identity check it ends up with `activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 1) Is that OK in general? 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, the more often keys are repeated the lower is performance due to hash collisions of non-equal objects. Here is an example: {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(i, ""); //unique keys } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 3970 ms. {code:java} try (Ignite ignite = Ignition.start(new IgniteConfiguration())) { try (IgniteCache cache = ignite.createCache("test"); IgniteDataStreamer dataStreamer = ignite.dataStreamer(cache.getName()) ) { dataStreamer.allowOverwrite(true); // doesn't matter long start = System.currentTimeMillis(); for (int i = 0; i < 2_000_000; i++) { dataStreamer.addData(0, ""); //equal key } long elapsed = System.currentTimeMillis() - start; System.out.println(elapsed); } } {code} runs in 12736 > DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys > -- > > Key: IGNITE-20610 > URL: