Re[4]: Ever increasing startup times as data grow in persistent storage

Zhenya Stanilovsky Wed, 13 Jan 2021 00:15:21 -0800


 
>Is there an API version of the cluster deactivation?
 
https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131
 
>On Wed, Jan 13, 2021 at 8:28 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>wrote:
>>
>>
>> 
>>>Hi Zhenya,
>>> 
>>>Thanks for confirming performing checkpoints more often will help here.
>>Hi Raymond !
>>> 
>>>I have established this configuration so will experiment with settings 
>>>little.
>>> 
>>>On a related note, is there any way to automatically trigger a checkpoint, 
>>>for instance as a pre-shutdown activity?
>> 
>>If you shutdown your cluster gracefully = with deactivation [1] further start 
>>will not trigger wal readings.
>> 
>>[1]  
>>https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster
>> 
>>>Checkpoints seem to be much faster than the process of applying WAL updates.
>>> 
>>>Raymond.  
>>>On Wed, Jan 13, 2021 at 8:07 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>>>wrote:
>>>>
>>>>
>>>>
>>>> 
>>>>>We have noticed that startup time for our server nodes has been slowly 
>>>>>increasing in time as the amount of data stored in the persistent store 
>>>>>grows.
>>>>> 
>>>>>This appears to be closely related to recovery of WAL changes that were 
>>>>>not checkpointed at the time the node was stopped.
>>>>> 
>>>>>After enabling debug logging we see that the WAL file is scanned, and for 
>>>>>every cache, all partitions in the cache are examined, and if there are 
>>>>>any uncommitted changes in the WAL file then the partition is updated (I 
>>>>>assume this requires reading of the partition itself as a part of this 
>>>>>process).
>>>>> 
>>>>>We now have ~150Gb of data in our persistent store and we see WAL update 
>>>>>times between 5-10 minutes to complete, during which the node is 
>>>>>unavailable.
>>>>> 
>>>>>We use fairly large WAL files (512Mb) and use 10 segments, with WAL 
>>>>>archiving enabled.
>>>>> 
>>>>>We anticipate data in persistent storage to grow to Terabytes, and if the 
>>>>>startup time continues to grow as storage grows then this makes deploys 
>>>>>and restarts difficult.
>>>>> 
>>>>>Until now we have been using the default checkpoint time out of 3 minutes 
>>>>>which may mean we have significant uncheckpointed data in the WAL files. 
>>>>>We are moving to 1 minute checkpoint but don't yet know if this improve 
>>>>>startup times. We also use the default 1024 partitions per cache, though 
>>>>>some partitions may be large. 
>>>>> 
>>>>>Can anyone confirm this is expected behaviour and recommendations for 
>>>>>resolving it?
>>>>> 
>>>>>Will reducing checking pointing intervals help?
>>>> 
>>>>yes, it will help. Check  
>>>>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>>>>>Is the entire content of a partition read while applying WAL changes?
>>>> 
>>>>don`t think so, may be someone else suggest here?
>>>>>Does anyone else have this issue?
>>>>> 
>>>>>Thanks,
>>>>>Raymond.
>>>>> 
>>>>>  --
>>>>>
>>>>>Raymond Wilson
>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>>raymond_wil...@trimble.com
>>>>>         
>>>>> 
>>>> 
>>>> 
>>>> 
>>>>  
>>> 
>>>  --
>>>
>>>Raymond Wilson
>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>raymond_wil...@trimble.com
>>>         
>>> 
>> 
>> 
>> 
>>  
> 
>  --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive |  Christchurch, New Zealand
>raymond_wil...@trimble.com
>         
>
Re[4]: Ever increasing startup times as data grow in persistent storage

Reply via email to