Re: Adding metrics of using WAL archive

2021-03-18 Thread Nikolay Izhikov
LGTM.

Please, proceed with the merge.

> 17 марта 2021 г., в 12:20, ткаленко кирилл  написал(а):
> 
> Hi Nikolay! Can you do an additional review or can we merge?
> 
> 15.03.2021, 08:48, "ткаленко кирилл" :
>> Hi Nikolay! Can you do an additional review or can we merge?
>> 
>> 05.03.2021, 16:33, "Nikolay Izhikov" :
>>>  Yes, definitely.
>>> 
   5 марта 2021 г., в 16:31, ткаленко кирилл  
 написал(а):
 
   Hi Nikolay, can you do a review?
 
   02.03.2021, 18:59, "Nikolay Izhikov" :
>   +1 For this.
> 
>>2 марта 2021 г., в 18:32, ткаленко кирилл  
>> написал(а):
>> 
>>Hi, Nikolay!
>> 
>>I thought about your proposal and offer the following two metrics:
>> 
>>1)The number of bytes logged to the WAL;
>>2)The number of compressed bytes in the WAL.
>> 
>>Monotonically increasing long.
>> 
>>WDYT?



Re: Adding metrics of using WAL archive

2021-03-17 Thread ткаленко кирилл
Hi Nikolay! Can you do an additional review or can we merge?

15.03.2021, 08:48, "ткаленко кирилл" :
> Hi Nikolay! Can you do an additional review or can we merge?
>
> 05.03.2021, 16:33, "Nikolay Izhikov" :
>>  Yes, definitely.
>>
>>>   5 марта 2021 г., в 16:31, ткаленко кирилл  
>>> написал(а):
>>>
>>>   Hi Nikolay, can you do a review?
>>>
>>>   02.03.2021, 18:59, "Nikolay Izhikov" :
   +1 For this.

>    2 марта 2021 г., в 18:32, ткаленко кирилл  
> написал(а):
>
>    Hi, Nikolay!
>
>    I thought about your proposal and offer the following two metrics:
>
>    1)The number of bytes logged to the WAL;
>    2)The number of compressed bytes in the WAL.
>
>    Monotonically increasing long.
>
>    WDYT?


Re: Adding metrics of using WAL archive

2021-03-14 Thread ткаленко кирилл
Hi Nikolay! Can you do an additional review or can we merge?

05.03.2021, 16:33, "Nikolay Izhikov" :
> Yes, definitely.
>
>>  5 марта 2021 г., в 16:31, ткаленко кирилл  написал(а):
>>
>>  Hi Nikolay, can you do a review?
>>
>>  02.03.2021, 18:59, "Nikolay Izhikov" :
>>>  +1 For this.
>>>
   2 марта 2021 г., в 18:32, ткаленко кирилл  
 написал(а):

   Hi, Nikolay!

   I thought about your proposal and offer the following two metrics:

   1)The number of bytes logged to the WAL;
   2)The number of compressed bytes in the WAL.

   Monotonically increasing long.

   WDYT?


Re: Adding metrics of using WAL archive

2021-03-05 Thread Nikolay Izhikov
Yes, definitely.

> 5 марта 2021 г., в 16:31, ткаленко кирилл  написал(а):
> 
> Hi Nikolay, can you do a review?
> 
> 02.03.2021, 18:59, "Nikolay Izhikov" :
>> +1 For this.
>> 
>>>  2 марта 2021 г., в 18:32, ткаленко кирилл  
>>> написал(а):
>>> 
>>>  Hi, Nikolay!
>>> 
>>>  I thought about your proposal and offer the following two metrics:
>>> 
>>>  1)The number of bytes logged to the WAL;
>>>  2)The number of compressed bytes in the WAL.
>>> 
>>>  Monotonically increasing long.
>>> 
>>>  WDYT?



Re: Adding metrics of using WAL archive

2021-03-05 Thread ткаленко кирилл
Hi Nikolay, can you do a review?

02.03.2021, 18:59, "Nikolay Izhikov" :
> +1 For this.
>
>>  2 марта 2021 г., в 18:32, ткаленко кирилл  написал(а):
>>
>>  Hi, Nikolay!
>>
>>  I thought about your proposal and offer the following two metrics:
>>
>>  1)The number of bytes logged to the WAL;
>>  2)The number of compressed bytes in the WAL.
>>
>>  Monotonically increasing long.
>>
>>  WDYT?


Re: Adding metrics of using WAL archive

2021-03-02 Thread Nikolay Izhikov
+1 For this.

> 2 марта 2021 г., в 18:32, ткаленко кирилл  написал(а):
> 
> Hi, Nikolay!
> 
> I thought about your proposal and offer the following two metrics:
> 
> 1)The number of bytes logged to the WAL;
> 2)The number of compressed bytes in the WAL.
> 
> Monotonically increasing long.
> 
> WDYT?



Adding metrics of using WAL archive

2021-03-02 Thread ткаленко кирилл
Hi, Nikolay!

I thought about your proposal and offer the following two metrics:

1)The number of bytes logged to the WAL;
2)The number of compressed bytes in the WAL.

Monotonically increasing long.

WDYT?


Re: Adding metrics of using WAL archive

2021-02-18 Thread ткаленко кирилл
Hello, Nikolay!

org.apache.ignite.mxbean.DataStorageMetricsMXBean#getLastArchivedSegmentIndex - 
Get the index of the last archived segment.

org.apache.ignite.mxbean.DataStorageMetricsMXBean#getMaxSizeCompressedArchivedSegment
 - Getting the size of the maximum compressed segment in the archive.

org.apache.ignite.mxbean.DataStorageMetricsMXBean#getWalLoggingSize - Getting 
the total size in bytes of logged records to the WAL.

18.02.2021, 15:34, "Nikolay Izhikov" :
> Hello, Kirill.
>
> Can you, please, write down your proposal?
> What metrics you want to add in the Ignite?
>
>>  18 февр. 2021 г., в 14:11, ткаленко кирилл  
>> написал(а):
>>
>>  Hi, Nikolay!
>>
>>  Have we reached a consensus?
>>
>>  16.02.2021, 17:09, "ткаленко кирилл" :
>>>  Hi, Zhenya!
>>>
>>>  Users can also use it, I see nothing wrong with the presence of two 
>>> metrics.
>>>
>>>  16.02.2021, 16:50, "Zhenya Stanilovsky" :
   Kirill, is it good practice to have a metrics for internal use? Don`t 
 think so.
   +1 witk Nikolay size is more readable than abstract segments count.

>   Hi, Nikolay!
>
>   For internal use, leave the metric that I propose and also add the 
> metric: Count of bytes logged in WAL. Why not "written" because for the 
> mmap we cannot track when the physical writting will occur.
>
>   16.02.2021, 15:42, "Nikolay Izhikov" < nizhi...@apache.org >:
>>    Kirill.
>>
>>    «Count of segments» is a very internal thing for a regular user.
>>    Regular user don’t want to know about such things.
>>
>>    You suggest to calculate the number (space required to store WAL) 
>> with some kind of rough calculation, and with the «Count of bytes 
>> written in WAL» we can have exact number without any suggestions or 
>> calculations.
>>
>>    Moreover, «Count of bytes written in WAL» is independent on internal 
>> WAL implementation.
>>
>>    So, I think exact number is always better to have then some 
>> approximation.
>>
>>    What do you think?
>>
>>> 15 февр. 2021 г., в 20:45, ткаленко кирилл < tkalkir...@yandex.ru > 
>>> написал(а):
>>>
>>> Hi, Nikolay!
>>>
>>> We set the number of segments in the working directory, we also 
>>> delete by segment, it seems that this is a matter of usability. I 
>>> prefer to dwell on my own version, this is a simple metric that does 
>>> not hurt and you can add more as needed.
>>>
>>> 15.02.2021, 17:10, "Nikolay Izhikov" < nizhi...@apache.org >:
 My suggestion that «count of files» is meaningless number.
 And «count of bytes written to the files» is useful number to know 
 and use for capacity planning..

>  15 февр. 2021 г., в 15:59, ткаленко кирилл < 
> tkalkir...@yandex.ru > написал(а):
>
>  Hi, Nikolay!
>
>  There may be a number (count of segments * segment size) or 
> there may be a count of segments, whichever is more convenient for 
> the user.
>
>  15.02.2021, 13:14, "Nikolay Izhikov" < nizhi...@apache.org >:
>>  Hello, Kirill.
>>
>>  Thanks for an answers.
>>  Now, I understand your intentions.
>>
>>>   t also seems that it will be more natural to operate not just 
>>> bytes but multiples of a segment.
>>
>>  Can’t agree here.
>>  From my point of view - it’s better to know exact number, not 
>> just «count of segments».
>>
>>>   15 февр. 2021 г., в 13:00, ткаленко кирилл < 
>>> tkalkir...@yandex.ru > написал(а):
>>>
>>>   Hello, Nikolay!
>>>
>>>   The period of one day (24h) seems more natural, you can take 
>>> more or less, I think that one day may not be enough, and it is 
>>> worth getting the metric for several days (collect statistics) for 
>>> example a week. Yes, the total size of the segments may not be 
>>> DataStorageConfiguration#getMaxWalArchiveSize, but for capacity 
>>> planning, accuracy is not so important to us, since the load can 
>>> always change, it will hurt users more if we overflow the archive 
>>> and it will not be able to start the node. So to say that more is 
>>> better than less, it also seems that it will be more natural to 
>>> operate not just bytes but multiples of a segment.
>>>
>>>   In separate threads, you can discuss the metric that you 
>>> propose about page memory and indexes estimates.
>>>
>>>   14.02.2021, 11:54, "Nikolay Izhikov" < nizhi...@apache.org >:
   Hello, Kirill

   Your conclusions still not clear for me.

> It is not possible for us to estimate 

Re: Adding metrics of using WAL archive

2021-02-18 Thread Nikolay Izhikov
Hello, Kirill.

Can you, please, write down your proposal?
What metrics you want to add in the Ignite?

> 18 февр. 2021 г., в 14:11, ткаленко кирилл  написал(а):
> 
> Hi, Nikolay!
> 
> Have we reached a consensus?
> 
> 16.02.2021, 17:09, "ткаленко кирилл" :
>> Hi, Zhenya!
>> 
>> Users can also use it, I see nothing wrong with the presence of two metrics.
>> 
>> 16.02.2021, 16:50, "Zhenya Stanilovsky" :
>>>  Kirill, is it good practice to have a metrics for internal use? Don`t 
>>> think so.
>>>  +1 witk Nikolay size is more readable than abstract segments count.
>>> 
  Hi, Nikolay!
 
  For internal use, leave the metric that I propose and also add the 
 metric: Count of bytes logged in WAL. Why not "written" because for the 
 mmap we cannot track when the physical writting will occur.
 
  16.02.2021, 15:42, "Nikolay Izhikov" < nizhi...@apache.org >:
>   Kirill.
> 
>   «Count of segments» is a very internal thing for a regular user.
>   Regular user don’t want to know about such things.
> 
>   You suggest to calculate the number (space required to store WAL) with 
> some kind of rough calculation, and with the «Count of bytes written in 
> WAL» we can have exact number without any suggestions or calculations.
> 
>   Moreover, «Count of bytes written in WAL» is independent on internal 
> WAL implementation.
> 
>   So, I think exact number is always better to have then some 
> approximation.
> 
>   What do you think?
> 
>>15 февр. 2021 г., в 20:45, ткаленко кирилл < tkalkir...@yandex.ru > 
>> написал(а):
>> 
>>Hi, Nikolay!
>> 
>>We set the number of segments in the working directory, we also 
>> delete by segment, it seems that this is a matter of usability. I prefer 
>> to dwell on my own version, this is a simple metric that does not hurt 
>> and you can add more as needed.
>> 
>>15.02.2021, 17:10, "Nikolay Izhikov" < nizhi...@apache.org >:
>>>My suggestion that «count of files» is meaningless number.
>>>And «count of bytes written to the files» is useful number to know 
>>> and use for capacity planning..
>>> 
 15 февр. 2021 г., в 15:59, ткаленко кирилл < tkalkir...@yandex.ru 
 > написал(а):
 
 Hi, Nikolay!
 
 There may be a number (count of segments * segment size) or there 
 may be a count of segments, whichever is more convenient for the user.
 
 15.02.2021, 13:14, "Nikolay Izhikov" < nizhi...@apache.org >:
> Hello, Kirill.
> 
> Thanks for an answers.
> Now, I understand your intentions.
> 
>>  t also seems that it will be more natural to operate not just 
>> bytes but multiples of a segment.
> 
> Can’t agree here.
> From my point of view - it’s better to know exact number, not 
> just «count of segments».
> 
>>  15 февр. 2021 г., в 13:00, ткаленко кирилл < 
>> tkalkir...@yandex.ru > написал(а):
>> 
>>  Hello, Nikolay!
>> 
>>  The period of one day (24h) seems more natural, you can take 
>> more or less, I think that one day may not be enough, and it is 
>> worth getting the metric for several days (collect statistics) for 
>> example a week. Yes, the total size of the segments may not be 
>> DataStorageConfiguration#getMaxWalArchiveSize, but for capacity 
>> planning, accuracy is not so important to us, since the load can 
>> always change, it will hurt users more if we overflow the archive 
>> and it will not be able to start the node. So to say that more is 
>> better than less, it also seems that it will be more natural to 
>> operate not just bytes but multiples of a segment.
>> 
>>  In separate threads, you can discuss the metric that you 
>> propose about page memory and indexes estimates.
>> 
>>  14.02.2021, 11:54, "Nikolay Izhikov" < nizhi...@apache.org >:
>>>  Hello, Kirill
>>> 
>>>  Your conclusions still not clear for me.
>>> 
It is not possible for us to estimate how much space a user 
 will need in the archive so as not to overflow it under its load
We take the maximum 44 and multiply it by a 
 DataStorageConfiguration#getWalSegmentSize
>>> 
>>>  Why you take a single day (24h) for a standard period? Is 
>>> there any rationale behind this?
>>> 
>>>  1. We have `walAutoArchiveAfterInactivity` property. So WAL 
>>> segment can have a size less than the maximum.
>>>  2. For CDC feature I want to introduce «WAL force rollover 
>>> timeout» to make data 

Re: Adding metrics of using WAL archive

2021-02-18 Thread ткаленко кирилл
Hi, Nikolay!

Have we reached a consensus?

16.02.2021, 17:09, "ткаленко кирилл" :
> Hi, Zhenya!
>
> Users can also use it, I see nothing wrong with the presence of two metrics.
>
> 16.02.2021, 16:50, "Zhenya Stanilovsky" :
>>  Kirill, is it good practice to have a metrics for internal use? Don`t think 
>> so.
>>  +1 witk Nikolay size is more readable than abstract segments count.
>>
>>>  Hi, Nikolay!
>>>
>>>  For internal use, leave the metric that I propose and also add the metric: 
>>> Count of bytes logged in WAL. Why not "written" because for the mmap we 
>>> cannot track when the physical writting will occur.
>>>
>>>  16.02.2021, 15:42, "Nikolay Izhikov" < nizhi...@apache.org >:
   Kirill.

   «Count of segments» is a very internal thing for a regular user.
   Regular user don’t want to know about such things.

   You suggest to calculate the number (space required to store WAL) with 
 some kind of rough calculation, and with the «Count of bytes written in 
 WAL» we can have exact number without any suggestions or calculations.

   Moreover, «Count of bytes written in WAL» is independent on internal WAL 
 implementation.

   So, I think exact number is always better to have then some 
 approximation.

   What do you think?

>    15 февр. 2021 г., в 20:45, ткаленко кирилл < tkalkir...@yandex.ru > 
> написал(а):
>
>    Hi, Nikolay!
>
>    We set the number of segments in the working directory, we also delete 
> by segment, it seems that this is a matter of usability. I prefer to 
> dwell on my own version, this is a simple metric that does not hurt and 
> you can add more as needed.
>
>    15.02.2021, 17:10, "Nikolay Izhikov" < nizhi...@apache.org >:
>>    My suggestion that «count of files» is meaningless number.
>>    And «count of bytes written to the files» is useful number to know 
>> and use for capacity planning..
>>
>>> 15 февр. 2021 г., в 15:59, ткаленко кирилл < tkalkir...@yandex.ru > 
>>> написал(а):
>>>
>>> Hi, Nikolay!
>>>
>>> There may be a number (count of segments * segment size) or there 
>>> may be a count of segments, whichever is more convenient for the user.
>>>
>>> 15.02.2021, 13:14, "Nikolay Izhikov" < nizhi...@apache.org >:
 Hello, Kirill.

 Thanks for an answers.
 Now, I understand your intentions.

>  t also seems that it will be more natural to operate not just 
> bytes but multiples of a segment.

 Can’t agree here.
 From my point of view - it’s better to know exact number, not just 
 «count of segments».

>  15 февр. 2021 г., в 13:00, ткаленко кирилл < 
> tkalkir...@yandex.ru > написал(а):
>
>  Hello, Nikolay!
>
>  The period of one day (24h) seems more natural, you can take 
> more or less, I think that one day may not be enough, and it is worth 
> getting the metric for several days (collect statistics) for example 
> a week. Yes, the total size of the segments may not be 
> DataStorageConfiguration#getMaxWalArchiveSize, but for capacity 
> planning, accuracy is not so important to us, since the load can 
> always change, it will hurt users more if we overflow the archive and 
> it will not be able to start the node. So to say that more is better 
> than less, it also seems that it will be more natural to operate not 
> just bytes but multiples of a segment.
>
>  In separate threads, you can discuss the metric that you propose 
> about page memory and indexes estimates.
>
>  14.02.2021, 11:54, "Nikolay Izhikov" < nizhi...@apache.org >:
>>  Hello, Kirill
>>
>>  Your conclusions still not clear for me.
>>
>>>    It is not possible for us to estimate how much space a user 
>>> will need in the archive so as not to overflow it under its load
>>>    We take the maximum 44 and multiply it by a 
>>> DataStorageConfiguration#getWalSegmentSize
>>
>>  Why you take a single day (24h) for a standard period? Is there 
>> any rationale behind this?
>>
>>  1. We have `walAutoArchiveAfterInactivity` property. So WAL 
>> segment can have a size less than the maximum.
>>  2. For CDC feature I want to introduce «WAL force rollover 
>> timeout» to make data available for a consumer in a guaranteed 
>> period [1].
>>
>>  Why does the user want to estimate those numbers in the first 
>> place?
>>  Are we talking about some kind of capacity planning?
>>
>>  If yes, then maybe it will be better to have a 

Re: Adding metrics of using WAL archive

2021-02-16 Thread ткаленко кирилл
Hi, Zhenya!

Users can also use it, I see nothing wrong with the presence of two metrics.

16.02.2021, 16:50, "Zhenya Stanilovsky" :
> Kirill, is it good practice to have a metrics for internal use? Don`t think 
> so.
> +1 witk Nikolay size is more readable than abstract segments count.
>
>> Hi, Nikolay!
>>
>> For internal use, leave the metric that I propose and also add the metric: 
>> Count of bytes logged in WAL. Why not "written" because for the mmap we 
>> cannot track when the physical writting will occur.
>>
>> 16.02.2021, 15:42, "Nikolay Izhikov" < nizhi...@apache.org >:
>>>  Kirill.
>>>
>>>  «Count of segments» is a very internal thing for a regular user.
>>>  Regular user don’t want to know about such things.
>>>
>>>  You suggest to calculate the number (space required to store WAL) with 
>>> some kind of rough calculation, and with the «Count of bytes written in 
>>> WAL» we can have exact number without any suggestions or calculations.
>>>
>>>  Moreover, «Count of bytes written in WAL» is independent on internal WAL 
>>> implementation.
>>>
>>>  So, I think exact number is always better to have then some approximation.
>>>
>>>  What do you think?
>>>
   15 февр. 2021 г., в 20:45, ткаленко кирилл < tkalkir...@yandex.ru > 
 написал(а):

   Hi, Nikolay!

   We set the number of segments in the working directory, we also delete 
 by segment, it seems that this is a matter of usability. I prefer to dwell 
 on my own version, this is a simple metric that does not hurt and you can 
 add more as needed.

   15.02.2021, 17:10, "Nikolay Izhikov" < nizhi...@apache.org >:
>   My suggestion that «count of files» is meaningless number.
>   And «count of bytes written to the files» is useful number to know and 
> use for capacity planning..
>
>>    15 февр. 2021 г., в 15:59, ткаленко кирилл < tkalkir...@yandex.ru > 
>> написал(а):
>>
>>    Hi, Nikolay!
>>
>>    There may be a number (count of segments * segment size) or there may 
>> be a count of segments, whichever is more convenient for the user.
>>
>>    15.02.2021, 13:14, "Nikolay Izhikov" < nizhi...@apache.org >:
>>>    Hello, Kirill.
>>>
>>>    Thanks for an answers.
>>>    Now, I understand your intentions.
>>>
 t also seems that it will be more natural to operate not just 
 bytes but multiples of a segment.
>>>
>>>    Can’t agree here.
>>>    From my point of view - it’s better to know exact number, not just 
>>> «count of segments».
>>>
 15 февр. 2021 г., в 13:00, ткаленко кирилл < tkalkir...@yandex.ru 
 > написал(а):

 Hello, Nikolay!

 The period of one day (24h) seems more natural, you can take more 
 or less, I think that one day may not be enough, and it is worth 
 getting the metric for several days (collect statistics) for example a 
 week. Yes, the total size of the segments may not be 
 DataStorageConfiguration#getMaxWalArchiveSize, but for capacity 
 planning, accuracy is not so important to us, since the load can 
 always change, it will hurt users more if we overflow the archive and 
 it will not be able to start the node. So to say that more is better 
 than less, it also seems that it will be more natural to operate not 
 just bytes but multiples of a segment.

 In separate threads, you can discuss the metric that you propose 
 about page memory and indexes estimates.

 14.02.2021, 11:54, "Nikolay Izhikov" < nizhi...@apache.org >:
> Hello, Kirill
>
> Your conclusions still not clear for me.
>
>>   It is not possible for us to estimate how much space a user 
>> will need in the archive so as not to overflow it under its load
>>   We take the maximum 44 and multiply it by a 
>> DataStorageConfiguration#getWalSegmentSize
>
> Why you take a single day (24h) for a standard period? Is there 
> any rationale behind this?
>
> 1. We have `walAutoArchiveAfterInactivity` property. So WAL 
> segment can have a size less than the maximum.
> 2. For CDC feature I want to introduce «WAL force rollover 
> timeout» to make data available for a consumer in a guaranteed period 
> [1].
>
> Why does the user want to estimate those numbers in the first 
> place?
> Are we talking about some kind of capacity planning?
>
> If yes, then maybe it will be better to have a metric for a count 
> of bytes written in the WAL?
> With it, we will have an exact number of space we need for WAL.
>
> How user should estimate capacity for a page memory and indexes?
>
> 

Re[2]: Adding metrics of using WAL archive

2021-02-16 Thread Zhenya Stanilovsky

Kirill, is it good practice to have a metrics for internal use? Don`t think so.
+1 witk Nikolay size is more readable than abstract segments count. 
 
>Hi, Nikolay!
>
>For internal use, leave the metric that I propose and also add the metric: 
>Count of bytes logged in WAL. Why not "written" because for the mmap we cannot 
>track when the physical writting will occur.
>
>16.02.2021, 15:42, "Nikolay Izhikov" < nizhi...@apache.org >:
>> Kirill.
>>
>> «Count of segments» is a very internal thing for a regular user.
>> Regular user don’t want to know about such things.
>>
>> You suggest to calculate the number (space required to store WAL) with some 
>> kind of rough calculation, and with the «Count of bytes written in WAL» we 
>> can have exact number without any suggestions or calculations.
>>
>> Moreover, «Count of bytes written in WAL» is independent on internal WAL 
>> implementation.
>>
>> So, I think exact number is always better to have then some approximation.
>>
>> What do you think?
>>
>>>  15 февр. 2021 г., в 20:45, ткаленко кирилл < tkalkir...@yandex.ru > 
>>> написал(а):
>>>
>>>  Hi, Nikolay!
>>>
>>>  We set the number of segments in the working directory, we also delete by 
>>> segment, it seems that this is a matter of usability. I prefer to dwell on 
>>> my own version, this is a simple metric that does not hurt and you can add 
>>> more as needed.
>>>
>>>  15.02.2021, 17:10, "Nikolay Izhikov" < nizhi...@apache.org >:
  My suggestion that «count of files» is meaningless number.
  And «count of bytes written to the files» is useful number to know and 
 use for capacity planning..

>   15 февр. 2021 г., в 15:59, ткаленко кирилл < tkalkir...@yandex.ru > 
> написал(а):
>
>   Hi, Nikolay!
>
>   There may be a number (count of segments * segment size) or there may 
> be a count of segments, whichever is more convenient for the user.
>
>   15.02.2021, 13:14, "Nikolay Izhikov" < nizhi...@apache.org >:
>>   Hello, Kirill.
>>
>>   Thanks for an answers.
>>   Now, I understand your intentions.
>>
>>>    t also seems that it will be more natural to operate not just bytes 
>>> but multiples of a segment.
>>
>>   Can’t agree here.
>>   From my point of view - it’s better to know exact number, not just 
>> «count of segments».
>>
>>>    15 февр. 2021 г., в 13:00, ткаленко кирилл < tkalkir...@yandex.ru > 
>>> написал(а):
>>>
>>>    Hello, Nikolay!
>>>
>>>    The period of one day (24h) seems more natural, you can take more or 
>>> less, I think that one day may not be enough, and it is worth getting 
>>> the metric for several days (collect statistics) for example a week. 
>>> Yes, the total size of the segments may not be 
>>> DataStorageConfiguration#getMaxWalArchiveSize, but for capacity 
>>> planning, accuracy is not so important to us, since the load can always 
>>> change, it will hurt users more if we overflow the archive and it will 
>>> not be able to start the node. So to say that more is better than less, 
>>> it also seems that it will be more natural to operate not just bytes 
>>> but multiples of a segment.
>>>
>>>    In separate threads, you can discuss the metric that you propose 
>>> about page memory and indexes estimates.
>>>
>>>    14.02.2021, 11:54, "Nikolay Izhikov" < nizhi...@apache.org >:
    Hello, Kirill

    Your conclusions still not clear for me.

>  It is not possible for us to estimate how much space a user will 
> need in the archive so as not to overflow it under its load
>  We take the maximum 44 and multiply it by a 
> DataStorageConfiguration#getWalSegmentSize

    Why you take a single day (24h) for a standard period? Is there any 
 rationale behind this?

    1. We have `walAutoArchiveAfterInactivity` property. So WAL segment 
 can have a size less than the maximum.
    2. For CDC feature I want to introduce «WAL force rollover timeout» 
 to make data available for a consumer in a guaranteed period [1].

    Why does the user want to estimate those numbers in the first place?
    Are we talking about some kind of capacity planning?

    If yes, then maybe it will be better to have a metric for a count 
 of bytes written in the WAL?
    With it, we will have an exact number of space we need for WAL.

    How user should estimate capacity for a page memory and indexes?

    [1]  https://issues.apache.org/jira/browse/IGNITE-13582

> 14 февр. 2021 г., в 09:48, ткаленко кирилл < tkalkir...@yandex.ru 
> > написал(а):
>
> Hi, Nikolay!
>
> The user will be able to take the getLastArchivedSegmentIndex 
> 

Re: Adding metrics of using WAL archive

2021-02-16 Thread ткаленко кирилл
Hi, Nikolay!

For internal use, leave the metric that I propose and also add the metric: 
Count of bytes logged in WAL. Why not "written" because for the mmap we cannot 
track when the physical writting will occur.

16.02.2021, 15:42, "Nikolay Izhikov" :
> Kirill.
>
> «Count of segments» is a very internal thing for a regular user.
> Regular user don’t want to know about such things.
>
> You suggest to calculate the number (space required to store WAL) with some 
> kind of rough calculation, and with the «Count of bytes written in WAL» we 
> can have exact number without any suggestions or calculations.
>
> Moreover, «Count of bytes written in WAL» is independent on internal WAL 
> implementation.
>
> So, I think exact number is always better to have then some approximation.
>
> What do you think?
>
>>  15 февр. 2021 г., в 20:45, ткаленко кирилл  
>> написал(а):
>>
>>  Hi, Nikolay!
>>
>>  We set the number of segments in the working directory, we also delete by 
>> segment, it seems that this is a matter of usability. I prefer to dwell on 
>> my own version, this is a simple metric that does not hurt and you can add 
>> more as needed.
>>
>>  15.02.2021, 17:10, "Nikolay Izhikov" :
>>>  My suggestion that «count of files» is meaningless number.
>>>  And «count of bytes written to the files» is useful number to know and use 
>>> for capacity planning..
>>>
   15 февр. 2021 г., в 15:59, ткаленко кирилл  
 написал(а):

   Hi, Nikolay!

   There may be a number (count of segments * segment size) or there may be 
 a count of segments, whichever is more convenient for the user.

   15.02.2021, 13:14, "Nikolay Izhikov" :
>   Hello, Kirill.
>
>   Thanks for an answers.
>   Now, I understand your intentions.
>
>>    t also seems that it will be more natural to operate not just bytes 
>> but multiples of a segment.
>
>   Can’t agree here.
>   From my point of view - it’s better to know exact number, not just 
> «count of segments».
>
>>    15 февр. 2021 г., в 13:00, ткаленко кирилл  
>> написал(а):
>>
>>    Hello, Nikolay!
>>
>>    The period of one day (24h) seems more natural, you can take more or 
>> less, I think that one day may not be enough, and it is worth getting 
>> the metric for several days (collect statistics) for example a week. 
>> Yes, the total size of the segments may not be 
>> DataStorageConfiguration#getMaxWalArchiveSize, but for capacity 
>> planning, accuracy is not so important to us, since the load can always 
>> change, it will hurt users more if we overflow the archive and it will 
>> not be able to start the node. So to say that more is better than less, 
>> it also seems that it will be more natural to operate not just bytes but 
>> multiples of a segment.
>>
>>    In separate threads, you can discuss the metric that you propose 
>> about page memory and indexes estimates.
>>
>>    14.02.2021, 11:54, "Nikolay Izhikov" :
>>>    Hello, Kirill
>>>
>>>    Your conclusions still not clear for me.
>>>
  It is not possible for us to estimate how much space a user will 
 need in the archive so as not to overflow it under its load
  We take the maximum 44 and multiply it by a 
 DataStorageConfiguration#getWalSegmentSize
>>>
>>>    Why you take a single day (24h) for a standard period? Is there any 
>>> rationale behind this?
>>>
>>>    1. We have `walAutoArchiveAfterInactivity` property. So WAL segment 
>>> can have a size less than the maximum.
>>>    2. For CDC feature I want to introduce «WAL force rollover timeout» 
>>> to make data available for a consumer in a guaranteed period [1].
>>>
>>>    Why does the user want to estimate those numbers in the first place?
>>>    Are we talking about some kind of capacity planning?
>>>
>>>    If yes, then maybe it will be better to have a metric for a count of 
>>> bytes written in the WAL?
>>>    With it, we will have an exact number of space we need for WAL.
>>>
>>>    How user should estimate capacity for a page memory and indexes?
>>>
>>>    [1] https://issues.apache.org/jira/browse/IGNITE-13582
>>>
 14 февр. 2021 г., в 09:48, ткаленко кирилл  
 написал(а):

 Hi, Nikolay!

 The user will be able to take the getLastArchivedSegmentIndex 
 every day and remember it and do it, say, for several days.

 For example, when starting the application, the 
 getLastArchivedSegmentIndex is 0, then at the end of the first day the 
 value will be 30 at the end of the second 55 and at the end of the 
 third 99.
 It turns out that 30 segments were used for the first day, 25 for 
 the second and 44 for the third. We take the maximum 44 and 

Re: Adding metrics of using WAL archive

2021-02-16 Thread Nikolay Izhikov
Kirill.

«Count of segments» is a very internal thing for a regular user.
Regular user don’t want to know about such things.

You suggest to calculate the number (space required to store WAL) with some 
kind of rough calculation, and with the «Count of bytes written in WAL» we can 
have exact number without any suggestions or calculations.

Moreover, «Count of bytes written in WAL» is independent on internal WAL 
implementation.

So, I think exact number is always better to have then some approximation.

What do you think?


> 15 февр. 2021 г., в 20:45, ткаленко кирилл  написал(а):
> 
> Hi, Nikolay!
> 
> We set the number of segments in the working directory, we also delete by 
> segment, it seems that this is a matter of usability. I prefer to dwell on my 
> own version, this is a simple metric that does not hurt and you can add more 
> as needed.
> 
> 15.02.2021, 17:10, "Nikolay Izhikov" :
>> My suggestion that «count of files» is meaningless number.
>> And «count of bytes written to the files» is useful number to know and use 
>> for capacity planning..
>> 
>>>  15 февр. 2021 г., в 15:59, ткаленко кирилл  
>>> написал(а):
>>> 
>>>  Hi, Nikolay!
>>> 
>>>  There may be a number (count of segments * segment size) or there may be a 
>>> count of segments, whichever is more convenient for the user.
>>> 
>>>  15.02.2021, 13:14, "Nikolay Izhikov" :
  Hello, Kirill.
 
  Thanks for an answers.
  Now, I understand your intentions.
 
>   t also seems that it will be more natural to operate not just bytes but 
> multiples of a segment.
 
  Can’t agree here.
  From my point of view - it’s better to know exact number, not just «count 
 of segments».
 
>   15 февр. 2021 г., в 13:00, ткаленко кирилл  
> написал(а):
> 
>   Hello, Nikolay!
> 
>   The period of one day (24h) seems more natural, you can take more or 
> less, I think that one day may not be enough, and it is worth getting the 
> metric for several days (collect statistics) for example a week. Yes, the 
> total size of the segments may not be 
> DataStorageConfiguration#getMaxWalArchiveSize, but for capacity planning, 
> accuracy is not so important to us, since the load can always change, it 
> will hurt users more if we overflow the archive and it will not be able 
> to start the node. So to say that more is better than less, it also seems 
> that it will be more natural to operate not just bytes but multiples of a 
> segment.
> 
>   In separate threads, you can discuss the metric that you propose about 
> page memory and indexes estimates.
> 
>   14.02.2021, 11:54, "Nikolay Izhikov" :
>>   Hello, Kirill
>> 
>>   Your conclusions still not clear for me.
>> 
>>> It is not possible for us to estimate how much space a user will 
>>> need in the archive so as not to overflow it under its load
>>> We take the maximum 44 and multiply it by a 
>>> DataStorageConfiguration#getWalSegmentSize
>> 
>>   Why you take a single day (24h) for a standard period? Is there any 
>> rationale behind this?
>> 
>>   1. We have `walAutoArchiveAfterInactivity` property. So WAL segment 
>> can have a size less than the maximum.
>>   2. For CDC feature I want to introduce «WAL force rollover timeout» to 
>> make data available for a consumer in a guaranteed period [1].
>> 
>>   Why does the user want to estimate those numbers in the first place?
>>   Are we talking about some kind of capacity planning?
>> 
>>   If yes, then maybe it will be better to have a metric for a count of 
>> bytes written in the WAL?
>>   With it, we will have an exact number of space we need for WAL.
>> 
>>   How user should estimate capacity for a page memory and indexes?
>> 
>>   [1] https://issues.apache.org/jira/browse/IGNITE-13582
>> 
>>>14 февр. 2021 г., в 09:48, ткаленко кирилл  
>>> написал(а):
>>> 
>>>Hi, Nikolay!
>>> 
>>>The user will be able to take the getLastArchivedSegmentIndex every 
>>> day and remember it and do it, say, for several days.
>>> 
>>>For example, when starting the application, the 
>>> getLastArchivedSegmentIndex is 0, then at the end of the first day the 
>>> value will be 30 at the end of the second 55 and at the end of the 
>>> third 99.
>>>It turns out that 30 segments were used for the first day, 25 for 
>>> the second and 44 for the third. We take the maximum 44 and multiply it 
>>> by a DataStorageConfiguration#getWalSegmentSize, and we get the 
>>> possible maximum that the archive overflow was the least likely. If the 
>>> user uses compression, then it can be subtracted from the result 
>>> (result * getMaxSizeCompressedArchivedSegment).
>>> 
>>>13.02.2021, 10:47, "Nikolay Izhikov" :
Hello, Kirill.
 

Re: Adding metrics of using WAL archive

2021-02-15 Thread ткаленко кирилл
Hi, Nikolay!

We set the number of segments in the working directory, we also delete by 
segment, it seems that this is a matter of usability. I prefer to dwell on my 
own version, this is a simple metric that does not hurt and you can add more as 
needed.

15.02.2021, 17:10, "Nikolay Izhikov" :
> My suggestion that «count of files» is meaningless number.
> And «count of bytes written to the files» is useful number to know and use 
> for capacity planning..
>
>>  15 февр. 2021 г., в 15:59, ткаленко кирилл  
>> написал(а):
>>
>>  Hi, Nikolay!
>>
>>  There may be a number (count of segments * segment size) or there may be a 
>> count of segments, whichever is more convenient for the user.
>>
>>  15.02.2021, 13:14, "Nikolay Izhikov" :
>>>  Hello, Kirill.
>>>
>>>  Thanks for an answers.
>>>  Now, I understand your intentions.
>>>
   t also seems that it will be more natural to operate not just bytes but 
 multiples of a segment.
>>>
>>>  Can’t agree here.
>>>  From my point of view - it’s better to know exact number, not just «count 
>>> of segments».
>>>
   15 февр. 2021 г., в 13:00, ткаленко кирилл  
 написал(а):

   Hello, Nikolay!

   The period of one day (24h) seems more natural, you can take more or 
 less, I think that one day may not be enough, and it is worth getting the 
 metric for several days (collect statistics) for example a week. Yes, the 
 total size of the segments may not be 
 DataStorageConfiguration#getMaxWalArchiveSize, but for capacity planning, 
 accuracy is not so important to us, since the load can always change, it 
 will hurt users more if we overflow the archive and it will not be able to 
 start the node. So to say that more is better than less, it also seems 
 that it will be more natural to operate not just bytes but multiples of a 
 segment.

   In separate threads, you can discuss the metric that you propose about 
 page memory and indexes estimates.

   14.02.2021, 11:54, "Nikolay Izhikov" :
>   Hello, Kirill
>
>   Your conclusions still not clear for me.
>
>> It is not possible for us to estimate how much space a user will 
>> need in the archive so as not to overflow it under its load
>> We take the maximum 44 and multiply it by a 
>> DataStorageConfiguration#getWalSegmentSize
>
>   Why you take a single day (24h) for a standard period? Is there any 
> rationale behind this?
>
>   1. We have `walAutoArchiveAfterInactivity` property. So WAL segment can 
> have a size less than the maximum.
>   2. For CDC feature I want to introduce «WAL force rollover timeout» to 
> make data available for a consumer in a guaranteed period [1].
>
>   Why does the user want to estimate those numbers in the first place?
>   Are we talking about some kind of capacity planning?
>
>   If yes, then maybe it will be better to have a metric for a count of 
> bytes written in the WAL?
>   With it, we will have an exact number of space we need for WAL.
>
>   How user should estimate capacity for a page memory and indexes?
>
>   [1] https://issues.apache.org/jira/browse/IGNITE-13582
>
>>    14 февр. 2021 г., в 09:48, ткаленко кирилл  
>> написал(а):
>>
>>    Hi, Nikolay!
>>
>>    The user will be able to take the getLastArchivedSegmentIndex every 
>> day and remember it and do it, say, for several days.
>>
>>    For example, when starting the application, the 
>> getLastArchivedSegmentIndex is 0, then at the end of the first day the 
>> value will be 30 at the end of the second 55 and at the end of the third 
>> 99.
>>    It turns out that 30 segments were used for the first day, 25 for the 
>> second and 44 for the third. We take the maximum 44 and multiply it by a 
>> DataStorageConfiguration#getWalSegmentSize, and we get the possible 
>> maximum that the archive overflow was the least likely. If the user uses 
>> compression, then it can be subtracted from the result (result * 
>> getMaxSizeCompressedArchivedSegment).
>>
>>    13.02.2021, 10:47, "Nikolay Izhikov" :
>>>    Hello, Kirill.
>>>
 It is not possible for us to estimate how much space a user will 
 need in the archive so as not to overflow it under its load
>>>
>>>    It still not clear for me why do we need those metrics.
>>>    Can you please, write down specific scenario - how user will use 
>>> these metrics to estimate required WAL volume?
>>>
 12 февр. 2021 г., в 19:35, ткаленко кирилл  
 написал(а):

 Hi, Nikolay!

 It is not possible for us to estimate how much space a user will 
 need in the archive so as not to overflow it under its load. And the 
 proposed metrics will allow you to make a rough estimate.

   

Re: Adding metrics of using WAL archive

2021-02-15 Thread Nikolay Izhikov
My suggestion that «count of files» is meaningless number.
And «count of bytes written to the files» is useful number to know and use for 
capacity planning..

> 15 февр. 2021 г., в 15:59, ткаленко кирилл  написал(а):
> 
> Hi, Nikolay!
> 
> There may be a number (count of segments * segment size) or there may be a 
> count of segments, whichever is more convenient for the user.
> 
> 15.02.2021, 13:14, "Nikolay Izhikov" :
>> Hello, Kirill.
>> 
>> Thanks for an answers.
>> Now, I understand your intentions.
>> 
>>>  t also seems that it will be more natural to operate not just bytes but 
>>> multiples of a segment.
>> 
>> Can’t agree here.
>> From my point of view - it’s better to know exact number, not just «count of 
>> segments».
>> 
>>>  15 февр. 2021 г., в 13:00, ткаленко кирилл  
>>> написал(а):
>>> 
>>>  Hello, Nikolay!
>>> 
>>>  The period of one day (24h) seems more natural, you can take more or less, 
>>> I think that one day may not be enough, and it is worth getting the metric 
>>> for several days (collect statistics) for example a week. Yes, the total 
>>> size of the segments may not be 
>>> DataStorageConfiguration#getMaxWalArchiveSize, but for capacity planning, 
>>> accuracy is not so important to us, since the load can always change, it 
>>> will hurt users more if we overflow the archive and it will not be able to 
>>> start the node. So to say that more is better than less, it also seems that 
>>> it will be more natural to operate not just bytes but multiples of a 
>>> segment.
>>> 
>>>  In separate threads, you can discuss the metric that you propose about 
>>> page memory and indexes estimates.
>>> 
>>>  14.02.2021, 11:54, "Nikolay Izhikov" :
  Hello, Kirill
 
  Your conclusions still not clear for me.
 
>It is not possible for us to estimate how much space a user will need 
> in the archive so as not to overflow it under its load
>We take the maximum 44 and multiply it by a 
> DataStorageConfiguration#getWalSegmentSize
 
  Why you take a single day (24h) for a standard period? Is there any 
 rationale behind this?
 
  1. We have `walAutoArchiveAfterInactivity` property. So WAL segment can 
 have a size less than the maximum.
  2. For CDC feature I want to introduce «WAL force rollover timeout» to 
 make data available for a consumer in a guaranteed period [1].
 
  Why does the user want to estimate those numbers in the first place?
  Are we talking about some kind of capacity planning?
 
  If yes, then maybe it will be better to have a metric for a count of 
 bytes written in the WAL?
  With it, we will have an exact number of space we need for WAL.
 
  How user should estimate capacity for a page memory and indexes?
 
  [1] https://issues.apache.org/jira/browse/IGNITE-13582
 
>   14 февр. 2021 г., в 09:48, ткаленко кирилл  
> написал(а):
> 
>   Hi, Nikolay!
> 
>   The user will be able to take the getLastArchivedSegmentIndex every day 
> and remember it and do it, say, for several days.
> 
>   For example, when starting the application, the 
> getLastArchivedSegmentIndex is 0, then at the end of the first day the 
> value will be 30 at the end of the second 55 and at the end of the third 
> 99.
>   It turns out that 30 segments were used for the first day, 25 for the 
> second and 44 for the third. We take the maximum 44 and multiply it by a 
> DataStorageConfiguration#getWalSegmentSize, and we get the possible 
> maximum that the archive overflow was the least likely. If the user uses 
> compression, then it can be subtracted from the result (result * 
> getMaxSizeCompressedArchivedSegment).
> 
>   13.02.2021, 10:47, "Nikolay Izhikov" :
>>   Hello, Kirill.
>> 
>>>It is not possible for us to estimate how much space a user will 
>>> need in the archive so as not to overflow it under its load
>> 
>>   It still not clear for me why do we need those metrics.
>>   Can you please, write down specific scenario - how user will use these 
>> metrics to estimate required WAL volume?
>> 
>>>12 февр. 2021 г., в 19:35, ткаленко кирилл  
>>> написал(а):
>>> 
>>>Hi, Nikolay!
>>> 
>>>It is not possible for us to estimate how much space a user will 
>>> need in the archive so as not to overflow it under its load. And the 
>>> proposed metrics will allow you to make a rough estimate.
>>> 
>>>12.02.2021, 17:23, "Nikolay Izhikov" :
Hello, Kirill.
 
Can you, please, clarify - What question about WAL user have in 
 mind?
And what answers he(or she) gets with these new metrics?
 
> 12 февр. 2021 г., в 14:26, ткаленко кирилл  
> написал(а):
> 
> Hi everyone!
> At the moment, I have not found an 

Re: Adding metrics of using WAL archive

2021-02-15 Thread ткаленко кирилл
Hi, Nikolay!

There may be a number (count of segments * segment size) or there may be a 
count of segments, whichever is more convenient for the user.

15.02.2021, 13:14, "Nikolay Izhikov" :
> Hello, Kirill.
>
> Thanks for an answers.
> Now, I understand your intentions.
>
>>  t also seems that it will be more natural to operate not just bytes but 
>> multiples of a segment.
>
> Can’t agree here.
> From my point of view - it’s better to know exact number, not just «count of 
> segments».
>
>>  15 февр. 2021 г., в 13:00, ткаленко кирилл  
>> написал(а):
>>
>>  Hello, Nikolay!
>>
>>  The period of one day (24h) seems more natural, you can take more or less, 
>> I think that one day may not be enough, and it is worth getting the metric 
>> for several days (collect statistics) for example a week. Yes, the total 
>> size of the segments may not be 
>> DataStorageConfiguration#getMaxWalArchiveSize, but for capacity planning, 
>> accuracy is not so important to us, since the load can always change, it 
>> will hurt users more if we overflow the archive and it will not be able to 
>> start the node. So to say that more is better than less, it also seems that 
>> it will be more natural to operate not just bytes but multiples of a segment.
>>
>>  In separate threads, you can discuss the metric that you propose about page 
>> memory and indexes estimates.
>>
>>  14.02.2021, 11:54, "Nikolay Izhikov" :
>>>  Hello, Kirill
>>>
>>>  Your conclusions still not clear for me.
>>>
    It is not possible for us to estimate how much space a user will need 
 in the archive so as not to overflow it under its load
    We take the maximum 44 and multiply it by a 
 DataStorageConfiguration#getWalSegmentSize
>>>
>>>  Why you take a single day (24h) for a standard period? Is there any 
>>> rationale behind this?
>>>
>>>  1. We have `walAutoArchiveAfterInactivity` property. So WAL segment can 
>>> have a size less than the maximum.
>>>  2. For CDC feature I want to introduce «WAL force rollover timeout» to 
>>> make data available for a consumer in a guaranteed period [1].
>>>
>>>  Why does the user want to estimate those numbers in the first place?
>>>  Are we talking about some kind of capacity planning?
>>>
>>>  If yes, then maybe it will be better to have a metric for a count of bytes 
>>> written in the WAL?
>>>  With it, we will have an exact number of space we need for WAL.
>>>
>>>  How user should estimate capacity for a page memory and indexes?
>>>
>>>  [1] https://issues.apache.org/jira/browse/IGNITE-13582
>>>
   14 февр. 2021 г., в 09:48, ткаленко кирилл  
 написал(а):

   Hi, Nikolay!

   The user will be able to take the getLastArchivedSegmentIndex every day 
 and remember it and do it, say, for several days.

   For example, when starting the application, the 
 getLastArchivedSegmentIndex is 0, then at the end of the first day the 
 value will be 30 at the end of the second 55 and at the end of the third 
 99.
   It turns out that 30 segments were used for the first day, 25 for the 
 second and 44 for the third. We take the maximum 44 and multiply it by a 
 DataStorageConfiguration#getWalSegmentSize, and we get the possible 
 maximum that the archive overflow was the least likely. If the user uses 
 compression, then it can be subtracted from the result (result * 
 getMaxSizeCompressedArchivedSegment).

   13.02.2021, 10:47, "Nikolay Izhikov" :
>   Hello, Kirill.
>
>>    It is not possible for us to estimate how much space a user will need 
>> in the archive so as not to overflow it under its load
>
>   It still not clear for me why do we need those metrics.
>   Can you please, write down specific scenario - how user will use these 
> metrics to estimate required WAL volume?
>
>>    12 февр. 2021 г., в 19:35, ткаленко кирилл  
>> написал(а):
>>
>>    Hi, Nikolay!
>>
>>    It is not possible for us to estimate how much space a user will need 
>> in the archive so as not to overflow it under its load. And the proposed 
>> metrics will allow you to make a rough estimate.
>>
>>    12.02.2021, 17:23, "Nikolay Izhikov" :
>>>    Hello, Kirill.
>>>
>>>    Can you, please, clarify - What question about WAL user have in mind?
>>>    And what answers he(or she) gets with these new metrics?
>>>
 12 февр. 2021 г., в 14:26, ткаленко кирилл  
 написал(а):

 Hi everyone!
 At the moment, I have not found an opportunity to estimate how 
 many WAL segments fall into the archive, say per day.
 So I created a ticket 
 https://issues.apache.org/jira/browse/IGNITE-14170 to add a couple of 
 new metrics.


Re: Adding metrics of using WAL archive

2021-02-15 Thread Nikolay Izhikov
Hello, Kirill.

Thanks for an answers.
Now, I understand your intentions.

> t also seems that it will be more natural to operate not just bytes but 
> multiples of a segment.

Can’t agree here.
From my point of view - it’s better to know exact number, not just «count of 
segments».

> 15 февр. 2021 г., в 13:00, ткаленко кирилл  написал(а):
> 
> Hello, Nikolay!
> 
> The period of one day (24h) seems more natural, you can take more or less, I 
> think that one day may not be enough, and it is worth getting the metric for 
> several days (collect statistics) for example a week. Yes, the total size of 
> the segments may not be DataStorageConfiguration#getMaxWalArchiveSize, but 
> for capacity planning, accuracy is not so important to us, since the load can 
> always change, it will hurt users more if we overflow the archive and it will 
> not be able to start the node. So to say that more is better than less, it 
> also seems that it will be more natural to operate not just bytes but 
> multiples of a segment.
> 
> In separate threads, you can discuss the metric that you propose about page 
> memory and indexes estimates.
> 
> 
> 14.02.2021, 11:54, "Nikolay Izhikov" :
>> Hello, Kirill
>> 
>> Your conclusions still not clear for me.
>> 
>>>   It is not possible for us to estimate how much space a user will need in 
>>> the archive so as not to overflow it under its load
>>>   We take the maximum 44 and multiply it by a 
>>> DataStorageConfiguration#getWalSegmentSize
>> 
>> Why you take a single day (24h) for a standard period? Is there any 
>> rationale behind this?
>> 
>> 1. We have `walAutoArchiveAfterInactivity` property. So WAL segment can have 
>> a size less than the maximum.
>> 2. For CDC feature I want to introduce «WAL force rollover timeout» to make 
>> data available for a consumer in a guaranteed period [1].
>> 
>> Why does the user want to estimate those numbers in the first place?
>> Are we talking about some kind of capacity planning?
>> 
>> If yes, then maybe it will be better to have a metric for a count of bytes 
>> written in the WAL?
>> With it, we will have an exact number of space we need for WAL.
>> 
>> How user should estimate capacity for a page memory and indexes?
>> 
>> [1] https://issues.apache.org/jira/browse/IGNITE-13582
>> 
>>>  14 февр. 2021 г., в 09:48, ткаленко кирилл  
>>> написал(а):
>>> 
>>>  Hi, Nikolay!
>>> 
>>>  The user will be able to take the getLastArchivedSegmentIndex every day 
>>> and remember it and do it, say, for several days.
>>> 
>>>  For example, when starting the application, the 
>>> getLastArchivedSegmentIndex is 0, then at the end of the first day the 
>>> value will be 30 at the end of the second 55 and at the end of the third 99.
>>>  It turns out that 30 segments were used for the first day, 25 for the 
>>> second and 44 for the third. We take the maximum 44 and multiply it by a 
>>> DataStorageConfiguration#getWalSegmentSize, and we get the possible maximum 
>>> that the archive overflow was the least likely. If the user uses 
>>> compression, then it can be subtracted from the result (result * 
>>> getMaxSizeCompressedArchivedSegment).
>>> 
>>>  13.02.2021, 10:47, "Nikolay Izhikov" :
  Hello, Kirill.
 
>   It is not possible for us to estimate how much space a user will need 
> in the archive so as not to overflow it under its load
 
  It still not clear for me why do we need those metrics.
  Can you please, write down specific scenario - how user will use these 
 metrics to estimate required WAL volume?
 
>   12 февр. 2021 г., в 19:35, ткаленко кирилл  
> написал(а):
> 
>   Hi, Nikolay!
> 
>   It is not possible for us to estimate how much space a user will need 
> in the archive so as not to overflow it under its load. And the proposed 
> metrics will allow you to make a rough estimate.
> 
>   12.02.2021, 17:23, "Nikolay Izhikov" :
>>   Hello, Kirill.
>> 
>>   Can you, please, clarify - What question about WAL user have in mind?
>>   And what answers he(or she) gets with these new metrics?
>> 
>>>12 февр. 2021 г., в 14:26, ткаленко кирилл  
>>> написал(а):
>>> 
>>>Hi everyone!
>>>At the moment, I have not found an opportunity to estimate how many 
>>> WAL segments fall into the archive, say per day.
>>>So I created a ticket 
>>> https://issues.apache.org/jira/browse/IGNITE-14170 to add a couple of 
>>> new metrics.



Re: Adding metrics of using WAL archive

2021-02-15 Thread ткаленко кирилл
Hello, Nikolay!

The period of one day (24h) seems more natural, you can take more or less, I 
think that one day may not be enough, and it is worth getting the metric for 
several days (collect statistics) for example a week. Yes, the total size of 
the segments may not be DataStorageConfiguration#getMaxWalArchiveSize, but for 
capacity planning, accuracy is not so important to us, since the load can 
always change, it will hurt users more if we overflow the archive and it will 
not be able to start the node. So to say that more is better than less, it also 
seems that it will be more natural to operate not just bytes but multiples of a 
segment.

In separate threads, you can discuss the metric that you propose about page 
memory and indexes estimates.


14.02.2021, 11:54, "Nikolay Izhikov" :
> Hello, Kirill
>
> Your conclusions still not clear for me.
>
>>   It is not possible for us to estimate how much space a user will need in 
>> the archive so as not to overflow it under its load
>>   We take the maximum 44 and multiply it by a 
>> DataStorageConfiguration#getWalSegmentSize
>
> Why you take a single day (24h) for a standard period? Is there any rationale 
> behind this?
>
> 1. We have `walAutoArchiveAfterInactivity` property. So WAL segment can have 
> a size less than the maximum.
> 2. For CDC feature I want to introduce «WAL force rollover timeout» to make 
> data available for a consumer in a guaranteed period [1].
>
> Why does the user want to estimate those numbers in the first place?
> Are we talking about some kind of capacity planning?
>
> If yes, then maybe it will be better to have a metric for a count of bytes 
> written in the WAL?
> With it, we will have an exact number of space we need for WAL.
>
> How user should estimate capacity for a page memory and indexes?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-13582
>
>>  14 февр. 2021 г., в 09:48, ткаленко кирилл  
>> написал(а):
>>
>>  Hi, Nikolay!
>>
>>  The user will be able to take the getLastArchivedSegmentIndex every day and 
>> remember it and do it, say, for several days.
>>
>>  For example, when starting the application, the getLastArchivedSegmentIndex 
>> is 0, then at the end of the first day the value will be 30 at the end of 
>> the second 55 and at the end of the third 99.
>>  It turns out that 30 segments were used for the first day, 25 for the 
>> second and 44 for the third. We take the maximum 44 and multiply it by a 
>> DataStorageConfiguration#getWalSegmentSize, and we get the possible maximum 
>> that the archive overflow was the least likely. If the user uses 
>> compression, then it can be subtracted from the result (result * 
>> getMaxSizeCompressedArchivedSegment).
>>
>>  13.02.2021, 10:47, "Nikolay Izhikov" :
>>>  Hello, Kirill.
>>>
   It is not possible for us to estimate how much space a user will need in 
 the archive so as not to overflow it under its load
>>>
>>>  It still not clear for me why do we need those metrics.
>>>  Can you please, write down specific scenario - how user will use these 
>>> metrics to estimate required WAL volume?
>>>
   12 февр. 2021 г., в 19:35, ткаленко кирилл  
 написал(а):

   Hi, Nikolay!

   It is not possible for us to estimate how much space a user will need in 
 the archive so as not to overflow it under its load. And the proposed 
 metrics will allow you to make a rough estimate.

   12.02.2021, 17:23, "Nikolay Izhikov" :
>   Hello, Kirill.
>
>   Can you, please, clarify - What question about WAL user have in mind?
>   And what answers he(or she) gets with these new metrics?
>
>>    12 февр. 2021 г., в 14:26, ткаленко кирилл  
>> написал(а):
>>
>>    Hi everyone!
>>    At the moment, I have not found an opportunity to estimate how many 
>> WAL segments fall into the archive, say per day.
>>    So I created a ticket 
>> https://issues.apache.org/jira/browse/IGNITE-14170 to add a couple of 
>> new metrics.


Re: Adding metrics of using WAL archive

2021-02-14 Thread Nikolay Izhikov
Hello, Kirill

Your conclusions still not clear for me.

>  It is not possible for us to estimate how much space a user will need in the 
> archive so as not to overflow it under its load
>  We take the maximum 44  and multiply it by a 
> DataStorageConfiguration#getWalSegmentSize

Why you take a single day (24h) for a standard period? Is there any rationale 
behind this?

1. We have `walAutoArchiveAfterInactivity` property. So WAL segment can have a 
size less than the maximum.
2. For CDC feature I want to introduce «WAL force rollover timeout» to make 
data available for a consumer in a guaranteed period [1].

Why does the user want to estimate those numbers in the first place?
Are we talking about some kind of capacity planning?

If yes, then maybe it will be better to have a metric for a count of bytes 
written in the WAL?
With it, we will have an exact number of space we need for WAL.

How user should estimate capacity for a page memory and indexes?

[1] https://issues.apache.org/jira/browse/IGNITE-13582

> 14 февр. 2021 г., в 09:48, ткаленко кирилл  написал(а):
> 
> Hi, Nikolay!
> 
> The user will be able to take the getLastArchivedSegmentIndex every day and 
> remember it and do it, say, for several days. 
> 
> For example, when starting the application, the getLastArchivedSegmentIndex 
> is 0, then at the end of the first day the value will be 30 at the end of the 
> second 55 and at the end of the third 99.
> It turns out that 30 segments were used for the first day, 25 for the second 
> and 44 for the third. We take the maximum 44 and multiply it by a 
> DataStorageConfiguration#getWalSegmentSize, and we get the possible maximum 
> that the archive overflow was the least likely. If the user uses compression, 
> then it can be subtracted from the result (result * 
> getMaxSizeCompressedArchivedSegment).
> 
> 13.02.2021, 10:47, "Nikolay Izhikov" :
>> Hello, Kirill.
>> 
>>>  It is not possible for us to estimate how much space a user will need in 
>>> the archive so as not to overflow it under its load
>> 
>> It still not clear for me why do we need those metrics.
>> Can you please, write down specific scenario - how user will use these 
>> metrics to estimate required WAL volume?
>> 
>>>  12 февр. 2021 г., в 19:35, ткаленко кирилл  
>>> написал(а):
>>> 
>>>  Hi, Nikolay!
>>> 
>>>  It is not possible for us to estimate how much space a user will need in 
>>> the archive so as not to overflow it under its load. And the proposed 
>>> metrics will allow you to make a rough estimate.
>>> 
>>>  12.02.2021, 17:23, "Nikolay Izhikov" :
  Hello, Kirill.
 
  Can you, please, clarify - What question about WAL user have in mind?
  And what answers he(or she) gets with these new metrics?
 
>   12 февр. 2021 г., в 14:26, ткаленко кирилл  
> написал(а):
> 
>   Hi everyone!
>   At the moment, I have not found an opportunity to estimate how many WAL 
> segments fall into the archive, say per day.
>   So I created a ticket 
> https://issues.apache.org/jira/browse/IGNITE-14170 to add a couple of new 
> metrics.



Re: Adding metrics of using WAL archive

2021-02-13 Thread ткаленко кирилл
Hi, Nikolay!

The user will be able to take the getLastArchivedSegmentIndex every day and 
remember it and do it, say, for several days. 

For example, when starting the application, the getLastArchivedSegmentIndex is 
0, then at the end of the first day the value will be 30 at the end of the 
second 55 and at the end of the third 99.
It turns out that 30 segments were used for the first day, 25 for the second 
and 44 for the third. We take the maximum 44 and multiply it by a 
DataStorageConfiguration#getWalSegmentSize, and we get the possible maximum 
that the archive overflow was the least likely. If the user uses compression, 
then it can be subtracted from the result (result * 
getMaxSizeCompressedArchivedSegment).

13.02.2021, 10:47, "Nikolay Izhikov" :
> Hello, Kirill.
>
>>  It is not possible for us to estimate how much space a user will need in 
>> the archive so as not to overflow it under its load
>
> It still not clear for me why do we need those metrics.
> Can you please, write down specific scenario - how user will use these 
> metrics to estimate required WAL volume?
>
>>  12 февр. 2021 г., в 19:35, ткаленко кирилл  
>> написал(а):
>>
>>  Hi, Nikolay!
>>
>>  It is not possible for us to estimate how much space a user will need in 
>> the archive so as not to overflow it under its load. And the proposed 
>> metrics will allow you to make a rough estimate.
>>
>>  12.02.2021, 17:23, "Nikolay Izhikov" :
>>>  Hello, Kirill.
>>>
>>>  Can you, please, clarify - What question about WAL user have in mind?
>>>  And what answers he(or she) gets with these new metrics?
>>>
   12 февр. 2021 г., в 14:26, ткаленко кирилл  
 написал(а):

   Hi everyone!
   At the moment, I have not found an opportunity to estimate how many WAL 
 segments fall into the archive, say per day.
   So I created a ticket https://issues.apache.org/jira/browse/IGNITE-14170 
 to add a couple of new metrics.


Re: Adding metrics of using WAL archive

2021-02-12 Thread Nikolay Izhikov
Hello, Kirill.

> It is not possible for us to estimate how much space a user will need in the 
> archive so as not to overflow it under its load

It still not clear for me why do we need those metrics.
Can you please, write down specific scenario - how user will use these metrics 
to estimate required WAL volume?

> 12 февр. 2021 г., в 19:35, ткаленко кирилл  написал(а):
> 
> Hi, Nikolay!
> 
> It is not possible for us to estimate how much space a user will need in the 
> archive so as not to overflow it under its load. And the proposed metrics 
> will allow you to make a rough estimate.
> 
> 
> 12.02.2021, 17:23, "Nikolay Izhikov" :
>> Hello, Kirill.
>> 
>> Can you, please, clarify - What question about WAL user have in mind?
>> And what answers he(or she) gets with these new metrics?
>> 
>>>  12 февр. 2021 г., в 14:26, ткаленко кирилл  
>>> написал(а):
>>> 
>>>  Hi everyone!
>>>  At the moment, I have not found an opportunity to estimate how many WAL 
>>> segments fall into the archive, say per day.
>>>  So I created a ticket https://issues.apache.org/jira/browse/IGNITE-14170 
>>> to add a couple of new metrics.



Re: Adding metrics of using WAL archive

2021-02-12 Thread ткаленко кирилл
Hi, Nikolay!

It is not possible for us to estimate how much space a user will need in the 
archive so as not to overflow it under its load. And the proposed metrics will 
allow you to make a rough estimate.


12.02.2021, 17:23, "Nikolay Izhikov" :
> Hello, Kirill.
>
> Can you, please, clarify - What question about WAL user have in mind?
> And what answers he(or she) gets with these new metrics?
>
>>  12 февр. 2021 г., в 14:26, ткаленко кирилл  
>> написал(а):
>>
>>  Hi everyone!
>>  At the moment, I have not found an opportunity to estimate how many WAL 
>> segments fall into the archive, say per day.
>>  So I created a ticket https://issues.apache.org/jira/browse/IGNITE-14170 to 
>> add a couple of new metrics.


Re: Adding metrics of using WAL archive

2021-02-12 Thread Nikolay Izhikov
Hello, Kirill.

Can you, please, clarify - What question about WAL user have in mind?
And what answers he(or she) gets with these new metrics?

> 12 февр. 2021 г., в 14:26, ткаленко кирилл  написал(а):
> 
> Hi everyone!
> At the moment, I have not found an opportunity to estimate how many WAL 
> segments fall into the archive, say per day.
> So I created a ticket https://issues.apache.org/jira/browse/IGNITE-14170 to 
> add a couple of new metrics.



Adding metrics of using WAL archive

2021-02-12 Thread ткаленко кирилл
Hi everyone!
At the moment, I have not found an opportunity to estimate how many WAL 
segments fall into the archive, say per day.
So I created a ticket https://issues.apache.org/jira/browse/IGNITE-14170 to add 
a couple of new metrics.


[jira] [Created] (IGNITE-14170) Adding metrics of using WAL archive

2021-02-12 Thread Kirill Tkalenko (Jira)
Kirill Tkalenko created IGNITE-14170:


 Summary: Adding metrics of using WAL archive
 Key: IGNITE-14170
 URL: https://issues.apache.org/jira/browse/IGNITE-14170
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Reporter: Kirill Tkalenko
Assignee: Kirill Tkalenko
 Fix For: 2.11


At the moment there is no way to estimate how many segments in the archive we 
may need, for example, per day. It is proposed to add the following metrics:
* org.apache.ignite.mxbean.DataStorageMetricsMXBean#getLastArchivedSegmentIndex 
- Get the index of the last archived segment.
* 
org.apache.ignite.mxbean.DataStorageMetricsMXBean#getMaxSizeComressedArchivedSegment
 - Getting the size of the maximum compressed segment in the archive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)