Re: [DISCUSSION] Add index rebuild time metrics

Nikolay Izhikov Tue, 11 Aug 2020 06:32:22 -0700

> I propose to stick with a cache-group level metric (e.g. 
> getIndexBuildProgress)


+1

> that returns a float from 0 to 1, which is calculated as [processedKeys] / 
> [localCacheSize].

From my point of view, we shouldn’t do calculations on the Ignite side if we 
can avoid it.
I’d rather provide two separate metrics - processedKeys and localCacheSize.

> 11 авг. 2020 г., в 16:26, Ivan Rakov <ivan.glu...@gmail.com> написал(а):
> 
>> 
>> As a compromise, I can add jmx methods (rebuilding indexes in the process
>> and the percentage of rebuilding) for the entire node, but I tried to find
>> a suitable place and did not find it, tell me where to add it?
> 
> I have checked existing JMX beans. To be honest, I struggle to find a
> suitable place as well.
> We have ClusterMetrics that may represent the state of a local node, but
> this class is also used for aggregated cluster metrics. I can't propose a
> reasonable way to merge percentages from different nodes.
> On the other hand, total index rebuild for all caches isn't a common
> scenario. It's either performed after manual index.bin removal or after
> index creation, both operations are performed on cache / cache-group level.
> Also, all other similar metrics are provided on cache-group level.
> 
> I propose to stick with a cache-group level metric (e.g.
> getIndexBuildProgress) that returns a float from 0 to 1, which is
> calculated as [processedKeys] / [localCacheSize]. Even if a user handles
> metrics through Zabbix, I anticipate that he'll perform this calculation on
> his own in order to estimate progress. Let's help him a bit and perform it
> on the system side.
> If a per-group percentage metric is present, I
> think getIndexRebuildKeyProcessed becomes redundant.
> 
> On Tue, Aug 11, 2020 at 8:20 AM ткаленко кирилл <tkalkir...@yandex.ru>
> wrote:
> 
>> Hi, Ivan!
>> 
>> What precision would be sufficient?
>>> If the progress is very slow, I don't see issues with tracking it if the
>>> percentage float has enough precision.
>> 
>> I think we can add a mention getting cache size.
>>> 1. Gain an understanding that local cache size
>>> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
>>> isn't mentioned neither in javadoc nor in JMX method description).
>> 
>> Do you think users collect metrics with their hands? I think this is done
>> by other systems, such as zabbix.
>>> 2. Manually calculate sum of all metrics and divide to sum of all cache
>>> sizes.
>> 
>> As a compromise, I can add jmx methods (rebuilding indexes in the process
>> and the percentage of rebuilding) for the entire node, but I tried to find
>> a suitable place and did not find it, tell me where to add it?
>>> On the other hand, % of index rebuild progress is self-descriptive. I
>> don't
>>> understand why we tend to make user's life harder.
>> 
>> 10.08.2020, 21:57, "Ivan Rakov" <ivan.glu...@gmail.com>:
>>>> This metric can be used only for local node, to get size of cache use
>>>> 
>> org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
>>> 
>>> Got it, agree.
>>> 
>>> If there is a lot of data in node that can be rebuilt, percentage may
>>>> change very rarely and may not give an estimate of how much time is
>> left.
>>>> If we see for example that 50_000 keys are rebuilt once a minute, and
>> we
>>>> have 1_000_000_000 keys, then we can have an approximate estimate.
>> What do
>>>> you think of that?
>>> 
>>> If the progress is very slow, I don't see issues with tracking it if the
>>> percentage float has enough precision.
>>> Still, usability of the metric concerns me. In order to estimate
>> remaining
>>> time of index rebuild, user should:
>>> 1. Gain an understanding that local cache size
>>> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
>>> isn't mentioned neither in javadoc nor in JMX method description).
>>> 2. Manually calculate sum of all metrics and divide to sum of all cache
>>> sizes.
>>> On the other hand, % of index rebuild progress is self-descriptive. I
>> don't
>>> understand why we tend to make user's life harder.
>>> 
>>> --
>>> Best regards,
>>> Ivan
>>> 
>>> On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл <tkalkir...@yandex.ru>
>>> wrote:
>>> 
>>>> Hi, Ivan!
>>>> 
>>>> For this you can use
>>>> org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress
>>>>> How can a local number of processed keys can help us to understand
>> when
>>>>> index rebuild will be finished?
>>>> 
>>>> This metric can be used only for local node, to get size of cache use
>>>> 
>> org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
>>>>> We can't compare metric value with cache.size(). First one is
>> node-local,
>>>>> while cache size covers all partitions in the cluster.
>>>> 
>>>> If there is a lot of data in node that can be rebuilt, percentage may
>>>> change very rarely and may not give an estimate of how much time is
>> left.
>>>> If we see for example that 50_000 keys are rebuilt once a minute, and
>> we
>>>> have 1_000_000_000 keys, then we can have an approximate estimate.
>> What do
>>>> you think of that?
>>>>> I find one single metric much more usable. It would be perfect if
>> metric
>>>>> value is represented in percentage, e.g. current progress of local
>> node
>>>>> index rebuild is 60%.
>>>> 
>>>> 10.08.2020, 19:11, "Ivan Rakov" <ivan.glu...@gmail.com>:
>>>>> Folks,
>>>>> 
>>>>> Sorry for coming late to the party. I've taken a look at this issue
>>>> during
>>>>> review.
>>>>> 
>>>>> How can a local number of processed keys can help us to understand
>> when
>>>>> index rebuild will be finished?
>>>>> We can't compare metric value with cache.size(). First one is
>> node-local,
>>>>> while cache size covers all partitions in the cluster.
>>>>> Also, I don't understand why we need to keep separate metrics for all
>>>>> caches. Of course, the metric becomes more fair, but obviously
>> harder to
>>>>> make conclusions on whether "the index rebuild" process is over (and
>> the
>>>>> cluster is ready to process queries quickly).
>>>>> 
>>>>> I find one single metric much more usable. It would be perfect if
>> metric
>>>>> value is represented in percentage, e.g. current progress of local
>> node
>>>>> index rebuild is 60%.
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Ivan
>>>>> 
>>>>> On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov <
>>>> stanlukya...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Got it. I thought that index building and index rebuilding are
>>>> essentially
>>>>>> the same,
>>>>>> but now I see that they are different: index rebuilding cares about
>> all
>>>>>> indexes at once while index building cares about particular ones.
>>>>>> 
>>>>>> Kirill's approach sounds good.
>>>>>> 
>>>>>> Stan
>>>>>> 
>>>>>>> On 20 Jul 2020, at 14:54, Alexey Goncharuk <
>>>> alexey.goncha...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Stan,
>>>>>>> 
>>>>>>> Currently we never build indexes one-by-one - we always use a
>> cache
>>>> data
>>>>>>> row visitor which either updates all indexes (see
>>>>>> IndexRebuildFullClosure)
>>>>>>> or updates a set of all indexes that need to catch up (see
>>>>>>> IndexRebuildPartialClosure). GIven that, I do not see any need for
>>>>>>> per-index rebuild status as this status will be updated for all
>>>> outdated
>>>>>>> indexes simultaneously.
>>>>>>> 
>>>>>>> Kirill's approach for the total number of processed keys per cache
>>>> seems
>>>>>>> reasonable to me.
>>>>>>> 
>>>>>>> --AG
>>>>>>> 
>>>>>>> пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <tkalkir...@yandex.ru
>>> :
>>>>>>> 
>>>>>>>> Hi, Stan!
>>>>>>>> 
>>>>>>>> Perhaps it is worth clarifying what exactly I wanted to say.
>>>>>>>> Now we have 2 processes: building and rebuilding indexes.
>>>>>>>> 
>>>>>>>> At moment, we have some metrics for rebuilding indexes:
>>>>>>>> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
>>>>>>>> 
>>>>>>>> I suggest adding another metric "Indexrebuildkeyprocessed", which
>>>> will
>>>>>>>> allow you to determine how many records are left to rebuild for
>>>> cache.
>>>>>>>> 
>>>>>>>> I think your comments are more about building an index that may
>> need
>>>>>> more
>>>>>>>> metrics, but I think you should do it in a separate ticket.
>>>>>>>> 
>>>>>>>> 03.07.2020, 03:09, "Stanislav Lukyanov" <stanlukya...@gmail.com
>>> :
>>>>>>>>> If multiple indexes are to be built "number of indexed keys"
>>>> metric may
>>>>>>>> be misleading.
>>>>>>>>> 
>>>>>>>>> As a cluster admin, I'd like to know:
>>>>>>>>> - Are all indexes ready on a node?
>>>>>>>>> - How many indexes are to be built?
>>>>>>>>> - How much resources are used by the index building (how many
>>>> threads
>>>>>>>> are used)?
>>>>>>>>> - Which index(es?) is being built right now?
>>>>>>>>> - How much time until the current (single) index building
>> finishes?
>>>>>> Here
>>>>>>>> "time" can be a lot of things: partitions, entries, percent of
>> the
>>>>>> cache,
>>>>>>>> minutes and hours
>>>>>>>>> - How much time until all indexes are built?
>>>>>>>>> - How much does it take to build each of my indexes / a single
>>>> index of
>>>>>>>> my cache on average?
>>>>>>>>> 
>>>>>>>>> I think we need a set of metrics and/or log messages to solve
>> all
>>>> of
>>>>>>>> these questions.
>>>>>>>>> I imaging something like:
>>>>>>>>> - numberOfIndexesToBuild
>>>>>>>>> - a standard set of metrics on the index building thread pool
>> (do
>>>> we
>>>>>>>> already have it?)
>>>>>>>>> - currentlyBuiltIndexName (assuming we only build one at a time
>>>> which
>>>>>> is
>>>>>>>> probably not true)
>>>>>>>>> - for the "time" metrics I think percentage might be the best as
>>>> it's
>>>>>>>> the easiest to understand; we may add multiple metrics though.
>>>>>>>>> - For "time per each index" I'd add detailed log messages
>> stating
>>>> how
>>>>>>>> long did it take to build a particular index
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Stan
>>>>>>>>> 
>>>>>>>>>> On 26 Jun 2020, at 12:49, ткаленко кирилл <
>> tkalkir...@yandex.ru>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi, Igniters.
>>>>>>>>>> 
>>>>>>>>>> I would like to know if it is possible to estimate how much the
>>>> index
>>>>>>>> rebuild will take?
>>>>>>>>>> 
>>>>>>>>>> At the moment, I have found the following metrics [1] and [2]
>> and
>>>>>>>> since the rebuild is based on caches, I think it would be useful
>> to
>>>> know
>>>>>>>> how many records are processed in indexing. This way we can
>>>> estimate how
>>>>>>>> long we have to wait for the index to be rebuilt by subtracting
>> [3]
>>>> and
>>>>>> how
>>>>>>>> many records are indexed.
>>>>>>>>>> 
>>>>>>>>>> I think we should add this metric [4].
>>>>>>>>>> 
>>>>>>>>>> Comments, suggestions?
>>>>>>>>>> 
>>>>>>>>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184
>>>>>>>>>> [2] -
>>>>>>>> 
>>>>>> 
>>>> 
>>  
>> org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
>>>>>>>>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
>>>>>>>>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
>>>>>>>> 
>>

Re: [DISCUSSION] Add index rebuild time metrics

Reply via email to