I checked our code that creates the primary data region, and it does set
the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in
that region.
The secondary data region is much smaller, and is set to min/max = 128 Mb
of memory.
The checkpoints with the "too many dirty pages" reason were quoting less
than 100,000 dirty pages, so this must have been triggered on the size of
the smaller data region.
Both these data regions have persistence, and I think this may have been a
sub-optimal way to set it up. My aim was to provide a dedicated channel for
inbound data arriving to be queued that was not impacted by updates due to
processing of that data. I think it may be better to will change this
arrangement to use a single data region to make the checkpointing process
simpler and reduce cases where it decides there are too many dirty pages.
On Mon, Jan 4, 2021 at 11:39 PM Ilya Kasnacheev
wrote:
> Hello!
>
> I guess it's pool.pages() * 3L / 4
> Since, counter intuitively, the default ThrottlingPolicy is not
> ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.
>
> Regards,
>
> --
> Ilya Kasnacheev
>
>
> чт, 31 дек. 2020 г. в 04:33, Raymond Wilson :
>
>> Regards this section of code:
>>
>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
>> ? pool.pages() * 3L / 4
>> : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>
>> I think the correct ratio will be 2/3 of pages as we do not have a
>> throttling policy defined, correct?.
>>
>> On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky
>> wrote:
>>
>>> Correct code is running from here:
>>>
>>> if (checkpointReadWriteLock.getReadHoldCount() > 1 ||
>>> safeToUpdatePageMemories() || checkpointer.runner() == null)
>>> break;else {
>>> CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many
>>> dirty pages");
>>>
>>> and near you can see that :
>>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED?
>>> pool.pages() * 3L / 4: Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>>
>>> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this
>>> cp.
>>>
>>>
>>> In (
>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
>>> there is a mention of a dirty pages limit that is a factor that can trigger
>>> check points.
>>>
>>> I also found this issue:
>>> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
>>> where "too many dirty pages" is a reason given for initiating a checkpoint.
>>>
>>> After reviewing our logs I found this: (one example)
>>>
>>> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
>>> started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
>>> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
>>> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
>>> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
>>> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
>>> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
>>> pages']
>>>
>>> Which suggests we may have the issue where writes are frozen until the
>>> check point is completed.
>>>
>>> Looking at the AI 2.8.1 source code, the dirty page limit fraction
>>> appears to be 0.1 (10%), via this entry
>>> in GridCacheDatabaseSharedManager.java:
>>>
>>> /**
>>> * Threshold to calculate limit for pages list on-heap caches.
>>> *
>>>
>>> * Note: When a checkpoint is triggered, we need some amount of page
>>> memory to store pages list on-heap cache.
>>>
>>> * If a checkpoint is triggered by "too many dirty pages" reason and
>>> pages list cache is rather big, we can get
>>>
>>> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the
>>> total amount of cached page list buckets,
>>>
>>> * assuming that checkpoint will be triggered if no more then 3/4 of
>>> pages will be marked as dirty (there will be
>>>
>>> * at least 1/4 of clean pages) and each cached page list bucket can be
>>> stored to up to 2 pages (this value is not
>>>
>>> * static, but depends on PagesCache.MAX_SIZE, so if
>>> PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
>>>
>>> * more than 2 pages). Also some amount of page memory needed to store
>>> page list metadata.
>>> */
>>> private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>>>
>>> This raises two questions:
>>>
>>> 1. The data region where most writes are occurring has 4Gb allocated to
>>> it, though it is permitted to start at a much lower level. 4Gb should be
>>> 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>>>
>>> The 'limit holder' is calculated like this:
>>>
>>> /**
>>> * @return Holder for page list cache limit for given data region.
>>> */
>>> public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
>>> if