[ 
https://issues.apache.org/jira/browse/IGNITE-20610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grigory Domozhirov updated IGNITE-20610:
----------------------------------------
    Description: 
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if allowOverwrite == true 
and same keys are added to `DataStreamer`.

With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created 
for the key object ( 

[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to `GridConcurrentHashSet` wrapped in a 
`DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general?

2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
    try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
         IgniteDataStreamer<Integer, String> dataStreamer = 
ignite.dataStreamer(cache.getName())
    ) {
        dataStreamer.allowOverwrite(true); // doesn't matter
        long start = System.currentTimeMillis();
        for (int i = 0; i < 2_000_000; i++) {
            dataStreamer.addData(i, ""); //unique keys
        }
        long elapsed = System.currentTimeMillis() - start;
        System.out.println(elapsed);
    }
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
    try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
         IgniteDataStreamer<Integer, String> dataStreamer = 
ignite.dataStreamer(cache.getName())
    ) {
        dataStreamer.allowOverwrite(true); // doesn't matter
        long start = System.currentTimeMillis();
        for (int i = 0; i < 2_000_000; i++) {
            dataStreamer.addData(0, ""); //equal key
        }
        long elapsed = System.currentTimeMillis() - start;
        System.out.println(elapsed);
    }
} {code}
runs in 12736

 

 

  was:
While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
method.) is clear it seems to work not as expected if `allowOverwrite == true` 
and same keys are added to `DataStreamer`.

With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is created 
for the key object ( 

[https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
 ) and is added to `GridConcurrentHashSet` wrapped in a 
`DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with 
identity check it ends up with `activeKeys` containing multiple objects with 
equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 

 

1) Is that OK in general?

2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's hashCode, 
the more often keys are repeated the lower is performance due to hash 
collisions of non-equal objects. Here is an example:
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
    try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
         IgniteDataStreamer<Integer, String> dataStreamer = 
ignite.dataStreamer(cache.getName())
    ) {
        dataStreamer.allowOverwrite(true); // doesn't matter
        long start = System.currentTimeMillis();
        for (int i = 0; i < 2_000_000; i++) {
            dataStreamer.addData(i, ""); //unique keys
        }
        long elapsed = System.currentTimeMillis() - start;
        System.out.println(elapsed);
    }
} {code}
runs in 3970 ms.
{code:java}
try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
    try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
         IgniteDataStreamer<Integer, String> dataStreamer = 
ignite.dataStreamer(cache.getName())
    ) {
        dataStreamer.allowOverwrite(true); // doesn't matter
        long start = System.currentTimeMillis();
        for (int i = 0; i < 2_000_000; i++) {
            dataStreamer.addData(0, ""); //equal key
        }
        long elapsed = System.currentTimeMillis() - start;
        System.out.println(elapsed);
    }
} {code}
runs in 12736

 

 


> DataStreamerImpl.KeyCacheObjectWrapper low performance for non-unique keys
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-20610
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20610
>             Project: Ignite
>          Issue Type: Task
>          Components: streaming
>    Affects Versions: 2.15
>            Reporter: Grigory Domozhirov
>            Priority: Minor
>
> While intention for https://issues.apache.org/jira/browse/IGNITE-3828 (Data 
> streamer: use identity comparison for "activeKeys" in DataStreamerImpl.load0 
> method.) is clear it seems to work not as expected if allowOverwrite == true 
> and same keys are added to `DataStreamer`.
> With each `DataStreamer.addData()` a `new UserKeyCacheObjectImpl()` is 
> created for the key object ( 
> [https://github.com/apache/ignite/blob/ceb22d20cab407b038570c81be022d7233a6e12d/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/binary/CacheObjectBinaryProcessorImpl.java#L1316]
>  ) and is added to `GridConcurrentHashSet` wrapped in a 
> `DataStreamerImpl.KeyCacheObjectWrapper`. Since its equals is overridden with 
> identity check it ends up with `activeKeys` containing multiple objects with 
> equal `UserKeyCacheObjectImpl`s and thus barely acts is a set. 
>  
> 1) Is that OK in general?
> 2) Since `KeyCacheObjectWrapper.hashCode` returns actual key object's 
> hashCode, the more often keys are repeated the lower is performance due to 
> hash collisions of non-equal objects. Here is an example:
> {code:java}
> try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
>     try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
>          IgniteDataStreamer<Integer, String> dataStreamer = 
> ignite.dataStreamer(cache.getName())
>     ) {
>         dataStreamer.allowOverwrite(true); // doesn't matter
>         long start = System.currentTimeMillis();
>         for (int i = 0; i < 2_000_000; i++) {
>             dataStreamer.addData(i, ""); //unique keys
>         }
>         long elapsed = System.currentTimeMillis() - start;
>         System.out.println(elapsed);
>     }
> } {code}
> runs in 3970 ms.
> {code:java}
> try (Ignite ignite = Ignition.start(new IgniteConfiguration())) {
>     try (IgniteCache<Integer, Long> cache = ignite.createCache("test");
>          IgniteDataStreamer<Integer, String> dataStreamer = 
> ignite.dataStreamer(cache.getName())
>     ) {
>         dataStreamer.allowOverwrite(true); // doesn't matter
>         long start = System.currentTimeMillis();
>         for (int i = 0; i < 2_000_000; i++) {
>             dataStreamer.addData(0, ""); //equal key
>         }
>         long elapsed = System.currentTimeMillis() - start;
>         System.out.println(elapsed);
>     }
> } {code}
> runs in 12736
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to