Hi, Stan.

    Thank you for the answer.

>>>  "your data streamer queue size is something like"
You are right about writes queue on primary node. It has just some fixed size. But based on number of the CPUs. (x8). Even for my laptop I get 16x8=128 batches. I wonder why so much by default for persistence.

>>> "Can you check the heap dump in your tests to see what actually occupies most of the heap?" The backup nodes collect `GridDhtAtomicSingleUpdateRequest`with key/data `byte[]`. That's where we don't wait for in this case.

    I thought we might slightly adjust the default setting at least to make simple test more reliable. As a user, I wouldn't like that I just take a tool/product just to try/research and it fails quick. But yes, user still has the related setting `perNodeParallelOperations()`

WDYT?

30.10.2022 21:24, Stanislav Lukyanov пишет:
Hi Vladimir,

I think this is potentially an issue but I don't think this is about PDS at all.

The description is a bit vague, I have to say. AFAIU what you see is that when 
the caches are persistent the streamer writes data faster than the nodes 
(especially, backup nodes) process the writes.
Therefore, the nodes accumulate the writes in the queues, the queues grow, and 
then you might go OOM.

The solution to just have lesser queues when there is persistent (and therefore 
it's more likely the queues will reach the max size) is not the best one, in my 
opinion.
If the default max queue size is too large, it should be less always, 
regardless of why the queues grow.

Furthermore, I have a feeling that what gives you OOM isn't the data streamer 
queue... AFAIR your data streamer queue size is something like (entrySize * 
bufferSize * perNodeParallelOperations),
which for 1 kb entries and 16 threads gives (1kb * 512 * 16 * 8) = 64mb which 
is usually peanuts for server Java.

Can you check the heap dump in your tests to see what actually occupies most of 
the heap?

Thanks,
Stan

On 28 Oct 2022, at 11:54, Vladimir Steshin<vlads...@gmail.com>  wrote:

     Hi Folks,

     I found that Datastreamer may consume heap or use increased heap amount 
when loading into a persistent cache.
This may happen with streamer's 'allowOverwite'==true and the cache is in 
PRIMARY_SYNC mode.

     What I don't like here is that the case looks simple. Not the defaults, 
but user might meet the issue just in a trival test, trying/researching the 
streamer.

     Streamer has related 'perNodeParallelOperations()' which helps. But 
addinional DFLT_PARALLEL_PERSISTENT_OPS_MULTIPLIER might be set for PDS.

     My question are:
1) Is it an issue at all? Need to fix? A minor?
2) Should we bring additional default DFLT_PARALLEL_PERSISTENT_OPS_MULTIPLIER 
for PDS because it reduces heap consumption?
3) Better solution is backpressure. But does it worth the case?

Ticket:https://issues.apache.org/jira/browse/IGNITE-17735
PR:https://github.com/apache/ignite/pull/10343

Reply via email to