[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created 
([code|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316])
 for the key object and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper 
([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]).
 Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created 
([code|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]])
 for the key object and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper 
([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]).
 Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created 
([code|https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316])
 for the key object and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper 
([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]).
 Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created 
([code|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316])
 for the key object and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper 
([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]).
 Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created 
([code|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]])
 for the key object and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper 
([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]).
 Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() [is 
created|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]]
 for the key object and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper 
([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]).
 Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() [is 
created|[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]]
 for the key object and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper 
([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]).
 Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper (code) ( 
[https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L728C12-L728C12]
 ). Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper (code) ( 
[https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L728C12-L728C12]
 ). Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper ( 
https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L728C12-L728C12
 ). Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper ( 
https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L728C12-L728C12
 ). Since its equals is overridden with identity check it ends up with 
`activeKeys` containing multiple objects with equal `UserKeyCacheObjectImpl`s 
and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.

 

 


> DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
> 

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
 IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 29025 ms.

 

 


> DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
> --
>
> Key: 

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 29025 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 29025 ms.

 

 


> 

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 5_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 29025 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736 ms.

 

 


> DataStreamerImpl.KeyCacheObjectWrapper 

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 


> DataStreamerImpl.KeyCacheObjectWrapper low 

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 


> DataStreamerImpl.KeyCacheObjectWrapper low 

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 


> DataStreamerImpl.KeyCacheObjectWrapper low 

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 
[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 

[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general?

2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 


> DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
> --
>
>

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created for 
the key object ( 

[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general?

2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if {code:java}allowOverwrite 
== true{code} and same keys are added to `DataStreamer`.

With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created 
for the key object ( 

[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to `GridConcurrentHashSet` wrapped in a 
`DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general?

2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 


> DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
> --
>
> Key: IGNITE-20610
> URL: https://issues.apache.org/jira/browse/IGNITE-20610

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to `DataStreamer`.

With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created 
for the key object ( 

[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to `GridConcurrentHashSet` wrapped in a 
`DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general?

2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if `allowOverwrite == true` 
and same keys are added to `DataStreamer`.

With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created 
for the key object ( 

[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to `GridConcurrentHashSet` wrapped in a 
`DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general?

2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 


> DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
> --
>
> Key: IGNITE-20610
> URL: https://issues.apache.org/jira/browse/IGNITE-20610
> 

[jira] [Updated] (IGNITE-20610) DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys

2023-10-10 Thread Grigory Domozhirov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:

Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if {code:java}allowOverwrite 
== true{code} and same keys are added to `DataStreamer`.

With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created 
for the key object ( 

[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to `GridConcurrentHashSet` wrapped in a 
`DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general?

2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to `DataStreamer`.

With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created 
for the key object ( 

[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to `GridConcurrentHashSet` wrapped in a 
`DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general?

2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(i, ""); //unique keys
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
try (IgniteCache cache = ignite.createCache("test");
 IgniteDataStreamer dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
dataStreamer.allowOverwrite(true); // doesn't matter
long start = System.currentTimeMillis();
for (int i = 0; i < 2_000_000; i++) {
dataStreamer.addData(0, ""); //equal key
}
long elapsed = System.currentTimeMillis() - start;
System.out.println(elapsed);
}
} {code}
runs in 12736

 

 


> DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
> --
>
> Key: IGNITE-20610
> URL: