[ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:
----------------------------------------
    Description: 
While intention for 
[IGNITE-3828|https://issues.apache.org/jira/browse/IGNITE-3828 ] (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method) is clear it seems to work not as expected if allowOverwrite == true and 
same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created 
([code|https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316])
 for the key object and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper 
([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]).
 Since its equals is overridden with identity check it ends up with 
`activeKeys` Set containing multiple objects with equal 
`UserKeyCacheObjectImpl`s. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
     IgniteCache<Integer, Long> cache = ignite.createCache("test");
     IgniteDataStreamer<Integer, String> dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
    dataStreamer.allowOverwrite(true); // doesn't matter
    long start = System.currentTimeMillis();
    for (int i = 0; i < 5_000_000; i++) {
        dataStreamer.addData(i, ""); //unique keys
    }
    System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
     IgniteCache<Integer, Long> cache = ignite.createCache("test");
     IgniteDataStreamer<Integer, String> dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
    dataStreamer.allowOverwrite(true); // doesn't matter
    long start = System.currentTimeMillis();
    for (int i = 0; i < 5_000_000; i++) {
        dataStreamer.addData(0, ""); //equal key
    }
    System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method) is clear it seems to work not as expected if allowOverwrite == true and 
same keys are added to a DataStreamer.

With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created 
([code|https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316])
 for the key object and is added to GridConcurrentHashSet wrapped in a 
DataStreamerImpl.KeyCacheObjectWrapper 
([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]).
 Since its equals is overridden with identity check it ends up with 
`activeKeys` Set containing multiple objects with equal 
`UserKeyCacheObjectImpl`s. 

 

1) Is that OK in general? 
2) If yes, then does using GridConcurrentHashSet for activeKeys make any sense 
as all its entries are always non-equal?
3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower performance is due to hash 
collisions of non-equal objects. Here is a corner case:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
     IgniteCache<Integer, Long> cache = ignite.createCache("test");
     IgniteDataStreamer<Integer, String> dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
    dataStreamer.allowOverwrite(true); // doesn't matter
    long start = System.currentTimeMillis();
    for (int i = 0; i < 5_000_000; i++) {
        dataStreamer.addData(i, ""); //unique keys
    }
    System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 6029 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration());
     IgniteCache<Integer, Long> cache = ignite.createCache("test");
     IgniteDataStreamer<Integer, String> dataStreamer = 
ignite.dataStreamer(cache.getName())
) {
    dataStreamer.allowOverwrite(true); // doesn't matter
    long start = System.currentTimeMillis();
    for (int i = 0; i < 5_000_000; i++) {
        dataStreamer.addData(0, ""); //equal key
    }
    System.out.println(System.currentTimeMillis() - start);
}{code}
runs in 29025 ms.

 

 


> DataStreamer/KeyCacheObjectWrapper inefficiency for non-unique keys
> -------------------------------------------------------------------
>
>                 Key: IGNITE-20610
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20610
>             Project: Ignite
>          Issue Type: Task
>          Components: streaming
>    Affects Versions: 2.15
>            Reporter: Grigory Domozhirov
>            Priority: Minor
>
> While intention for 
> [IGNITE-3828|https://issues.apache.org/jira/browse/IGNITE-3828 ] (Data 
> streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
> method) is clear it seems to work not as expected if allowOverwrite == true 
> and same keys are added to a DataStreamer.
> With each DataStreamer.addData() a new UserKeyCacheObjectImpl() is created 
> ([code|https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316])
>  for the key object and is added to GridConcurrentHashSet wrapped in a 
> DataStreamerImpl.KeyCacheObjectWrapper 
> ([code|https://github.com/apache/ignite/blob/fd504159bf5bc1603dfd5eb149ab5d998d3bffb4/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerImpl.java#L729]).
>  Since its equals is overridden with identity check it ends up with 
> `activeKeys` Set containing multiple objects with equal 
> `UserKeyCacheObjectImpl`s. 
>  
> 1) Is that OK in general? 
> 2) If yes, then does using GridConcurrentHashSet for activeKeys make any 
> sense as all its entries are always non-equal?
> 3) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's 
> hashCode, the more often keys are repeated the lower performance is due to 
> hash collisions of non-equal objects. Here is a corner case:
> {code:java}
> try (Ignite ignite = Ignition.start(new IgniteConfiguration());
>      IgniteCache<Integer, Long> cache = ignite.createCache("test");
>      IgniteDataStreamer<Integer, String> dataStreamer = 
> ignite.dataStreamer(cache.getName())
> ) {
>     dataStreamer.allowOverwrite(true); // doesn't matter
>     long start = System.currentTimeMillis();
>     for (int i = 0; i < 5_000_000; i++) {
>         dataStreamer.addData(i, ""); //unique keys
>     }
>     System.out.println(System.currentTimeMillis() - start);
> }{code}
> runs in 6029 ms.
> {code:java}
> try (Ignite ignite = Ignition.start(new IgniteConfiguration());
>      IgniteCache<Integer, Long> cache = ignite.createCache("test");
>      IgniteDataStreamer<Integer, String> dataStreamer = 
> ignite.dataStreamer(cache.getName())
> ) {
>     dataStreamer.allowOverwrite(true); // doesn't matter
>     long start = System.currentTimeMillis();
>     for (int i = 0; i < 5_000_000; i++) {
>         dataStreamer.addData(0, ""); //equal key
>     }
>     System.out.println(System.currentTimeMillis() - start);
> }{code}
> runs in 29025 ms.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to