Hi, Nikolay! We set the number of segments in the working directory, we also delete by segment, it seems that this is a matter of usability. I prefer to dwell on my own version, this is a simple metric that does not hurt and you can add more as needed.
15.02.2021, 17:10, "Nikolay Izhikov" <nizhi...@apache.org>: > My suggestion that «count of files» is meaningless number. > And «count of bytes written to the files» is useful number to know and use > for capacity planning.. > >> 15 февр. 2021 г., в 15:59, ткаленко кирилл <tkalkir...@yandex.ru> >> написал(а): >> >> Hi, Nikolay! >> >> There may be a number (count of segments * segment size) or there may be a >> count of segments, whichever is more convenient for the user. >> >> 15.02.2021, 13:14, "Nikolay Izhikov" <nizhi...@apache.org>: >>> Hello, Kirill. >>> >>> Thanks for an answers. >>> Now, I understand your intentions. >>> >>>> t also seems that it will be more natural to operate not just bytes but >>>> multiples of a segment. >>> >>> Can’t agree here. >>> From my point of view - it’s better to know exact number, not just «count >>> of segments». >>> >>>> 15 февр. 2021 г., в 13:00, ткаленко кирилл <tkalkir...@yandex.ru> >>>> написал(а): >>>> >>>> Hello, Nikolay! >>>> >>>> The period of one day (24h) seems more natural, you can take more or >>>> less, I think that one day may not be enough, and it is worth getting the >>>> metric for several days (collect statistics) for example a week. Yes, the >>>> total size of the segments may not be >>>> DataStorageConfiguration#getMaxWalArchiveSize, but for capacity planning, >>>> accuracy is not so important to us, since the load can always change, it >>>> will hurt users more if we overflow the archive and it will not be able to >>>> start the node. So to say that more is better than less, it also seems >>>> that it will be more natural to operate not just bytes but multiples of a >>>> segment. >>>> >>>> In separate threads, you can discuss the metric that you propose about >>>> page memory and indexes estimates. >>>> >>>> 14.02.2021, 11:54, "Nikolay Izhikov" <nizhi...@apache.org>: >>>>> Hello, Kirill >>>>> >>>>> Your conclusions still not clear for me. >>>>> >>>>>> It is not possible for us to estimate how much space a user will >>>>>> need in the archive so as not to overflow it under its load >>>>>> We take the maximum 44 and multiply it by a >>>>>> DataStorageConfiguration#getWalSegmentSize >>>>> >>>>> Why you take a single day (24h) for a standard period? Is there any >>>>> rationale behind this? >>>>> >>>>> 1. We have `walAutoArchiveAfterInactivity` property. So WAL segment can >>>>> have a size less than the maximum. >>>>> 2. For CDC feature I want to introduce «WAL force rollover timeout» to >>>>> make data available for a consumer in a guaranteed period [1]. >>>>> >>>>> Why does the user want to estimate those numbers in the first place? >>>>> Are we talking about some kind of capacity planning? >>>>> >>>>> If yes, then maybe it will be better to have a metric for a count of >>>>> bytes written in the WAL? >>>>> With it, we will have an exact number of space we need for WAL. >>>>> >>>>> How user should estimate capacity for a page memory and indexes? >>>>> >>>>> [1] https://issues.apache.org/jira/browse/IGNITE-13582 >>>>> >>>>>> 14 февр. 2021 г., в 09:48, ткаленко кирилл <tkalkir...@yandex.ru> >>>>>> написал(а): >>>>>> >>>>>> Hi, Nikolay! >>>>>> >>>>>> The user will be able to take the getLastArchivedSegmentIndex every >>>>>> day and remember it and do it, say, for several days. >>>>>> >>>>>> For example, when starting the application, the >>>>>> getLastArchivedSegmentIndex is 0, then at the end of the first day the >>>>>> value will be 30 at the end of the second 55 and at the end of the third >>>>>> 99. >>>>>> It turns out that 30 segments were used for the first day, 25 for the >>>>>> second and 44 for the third. We take the maximum 44 and multiply it by a >>>>>> DataStorageConfiguration#getWalSegmentSize, and we get the possible >>>>>> maximum that the archive overflow was the least likely. If the user uses >>>>>> compression, then it can be subtracted from the result (result * >>>>>> getMaxSizeCompressedArchivedSegment). >>>>>> >>>>>> 13.02.2021, 10:47, "Nikolay Izhikov" <nizhi...@apache.org>: >>>>>>> Hello, Kirill. >>>>>>> >>>>>>>> It is not possible for us to estimate how much space a user will >>>>>>>> need in the archive so as not to overflow it under its load >>>>>>> >>>>>>> It still not clear for me why do we need those metrics. >>>>>>> Can you please, write down specific scenario - how user will use >>>>>>> these metrics to estimate required WAL volume? >>>>>>> >>>>>>>> 12 февр. 2021 г., в 19:35, ткаленко кирилл <tkalkir...@yandex.ru> >>>>>>>> написал(а): >>>>>>>> >>>>>>>> Hi, Nikolay! >>>>>>>> >>>>>>>> It is not possible for us to estimate how much space a user will >>>>>>>> need in the archive so as not to overflow it under its load. And the >>>>>>>> proposed metrics will allow you to make a rough estimate. >>>>>>>> >>>>>>>> 12.02.2021, 17:23, "Nikolay Izhikov" <nizhi...@apache.org>: >>>>>>>>> Hello, Kirill. >>>>>>>>> >>>>>>>>> Can you, please, clarify - What question about WAL user have in >>>>>>>>> mind? >>>>>>>>> And what answers he(or she) gets with these new metrics? >>>>>>>>> >>>>>>>>>> 12 февр. 2021 г., в 14:26, ткаленко кирилл >>>>>>>>>> <tkalkir...@yandex.ru> написал(а): >>>>>>>>>> >>>>>>>>>> Hi everyone! >>>>>>>>>> At the moment, I have not found an opportunity to estimate how >>>>>>>>>> many WAL segments fall into the archive, say per day. >>>>>>>>>> So I created a ticket >>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-14170 to add a couple >>>>>>>>>> of new metrics.