[ 
https://issues.apache.org/jira/browse/IGNITE-17735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607891#comment-17607891
 ] 

Vladimir Steshin edited comment on IGNITE-17735 at 10/28/22 5:18 AM:
---------------------------------------------------------------------

Datastreamer with 'allowOverwrite==true' and PRIMARY_SYNC persistent cache may 
cause heap issue or consume increased heap amount with persistent caches.

There is related fixing 'perNodeParallelOperations()' setting. But I met this 
issue with trivial research like `HeapConsumptionDataStreamerTest.src`.
Default parallel batches amount for persistent caches looks too much. The 
setting is historically for in-memory caches. 

What happens: streamer keep sending more and more streamer batches to process 
while receiving node collects backup updates futures, requests. 
Similarly, backup node accumulates incoming update requests stucking at disk 
writes. See 'DS_heap_consumption.png' for example.

Suggestion.
We might bring reduced default parallel batches number for persistent caches 
`IgniteDataStreamer#DFLT_PARALLEL_OPS_PERSISTENT_MULTIPLIER`(PR #10343).
Why sending more? Helps a lot, reduces heap utilization even if there is no 
OOMe. Better solution would be a backpressure. Not sure it worth the case.

Did some benchmarks. For persistent caches `CPUs x 2` seems enough.


was (Author: vladsz83):
Datastreamer with 'allowOverwrite==true' and PRIMARY_SYNC persistent cache may 
cause heap issue or consume increased heap amount.

There is related 'perNodeParallelOperations()' setting. What discouraged, I met 
this issue with trivial research like few servers, simple cache and just trying 
data streaming with various persistence and loading settings (like 
`HeapConsumptionDataStreamerTest.src`). Think user may meet the same. The 
default value might be adjusted for persistant caches.

Streamer node may not wait for backup updates. And keep sending more and more 
streamer batches to process. The receiving node collects related to backup 
updates futures, requests. 
The same happens on backup node: collecting update incoming update requests 
stucking at disk writes. See 'DS_heap_consumption.png' for example.

Suggestion: bring reduced default parallel batches number for persistent caches 
`IgniteDataStreamer#DFLT_PARALLEL_OPS_PERSISTENT_MULTIPLIER` (PR #10343).

Did estimation benchmarks. Even in-memory benchmarks (like 
'bench_inmem_isolated_pc2.txt') shows 2 or may be 4 batches per threads seems 
enough. 

For persistent caches, `CPUs x 2` seems enough. See 
`bench_persistent_results_Isolated_pc1.txt` and 
`bench_persistent_results_Individual_pc1.txt`

> Datastreamer may consume heap with allowOverwtire=='true'.
> ----------------------------------------------------------
>
>                 Key: IGNITE-17735
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17735
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Major
>              Labels: ise
>         Attachments: DS_heap_consumption.png, DS_heap_consumption_2.png, 
> HeapConsumptionDataStreamerTest.src, 
> bench_persistent_results_Individual_pc1.txt, 
> bench_persistent_results_Isolated_pc1.txt
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to