Hello, Andrey.

The goal of the proposed metrics is to measure whole cache operations behavior.
It provides some kind of statistics(histograms) for it.
For more fine-grained analysis one will be use tracing or other «go deeper» 
tools.

> > Measured for API calls on the caller node side
> Values will the same only for cases when node is remote relative to data

Yes, metrics will evaluate API call performance.
I think this is the most valuable information from a user's point of view.

Regular user wants to know how fast his cache operation performs.
And these metrics provide the answer.

> For regular data node (server node) timing will depend on answers for 
> question:

I think these answers are always available.
I barely can imagine a scenario when one monitor «black box» cluster and don’t 
know it.
Even so, all answers are provided through system view we brought to the Ignite 
:)

> What is transaction commit or rollback time?

These are metrics of client-side operation performance.

I think a specific user has knowledge - what are his transactions.
From these metrics it can answer on the question «If my transaction includes 
cacheXXX, how long it usually takes?» 
I think it’s very valuable knowledge.

> It will be implemented for most types of messages.

Good, let’s do it?

> So, from my point of view, commits for get/put/remove and commit/rollback 
> should be reverted.

I disagree here.
If you have a better approach to measure cache operations performance - please, 
share your vision.

> 19 дек. 2019 г., в 16:03, Andrey Gura <ag...@apache.org> написал(а):
> 
> From my point of view, Ignite should provide meaningful metrics for
> internal components that could be useful for monitoring and analysis.
> All suggested options are meaningless in a sense. Below I'll try
> explain why.
> 
>> * `get`, `put`, `remove` time histograms. Measured for API calls on the 
>> caller node side.
>>   Implemented in [1], commit [2].
> 
> All cache operations in Ignite are distributed. So each value measured
> for some cache operation will vary depending on where actually
> operation is performed. Values will the same only for cases when node
> is remote relative to data (e.g. client node).
> 
> For regular data node (server node) timing will depend on answers for 
> question:
> 
> - is node primary for particular key or not? (for all operations)
> - how many backups configured for the cache? (for put and remove)
> - what write synchronization mode is configured for particular cache?
> (for put and remove)
> - is readFromBackup enabled for the cache? (for get)
> 
> Both Ignite users and Ignite developers can't make any decision based
> on this metrics.
> 
>> * `commit`, `rollback` time histograms. Measured for API calls on the caller 
>> node side [3].
> 
> What is transaction commit or rollback time? How it calculates in
> Ignite now? What actions included into transaction? What actions not
> related with cache executed during transactions?
> 
> There is no any sense in time of transaction commit or rollback
> because there are no any way to understand what transaction was
> performed in particular period of time. Usually a lot of transactions
> and we can't to distinguish from each other.
> 
> Moreover, transaction usually treats as business operation. So only
> way to measure performance properly is measure business operation
> time. That is user should create own metrics set for some business
> API.
> 
> Further. What about cross cache transactions? At the moment tx
> commit/rollback time will be added to corresponding metrics per each
> cache evolved to the transaction. The *same time* for *each cache*.
> Absolutely meaningless.
> 
> Again, both Ignite users and Ignite developers can't make any decision
> based on this metrics. But users can create own metrics set.
> 
>> * histograms that measure the time of processing `get`, `put`, `remove`, 
>> `commit`, `rollback` messages on affinity nodes(primary and backups).
>>   Ticket doesn't exist for it.
> 
> It will be implemented for most types of messages.
> 
> Metrics, application monitoring, performance analysis and measurement
> are a a little harder than it sounds. Therefore, we must approach this
> issue more carefully.
> Blindly adding new types of metrics will not only not improve the
> situation, but will also worsen the overall performance of the system
> because metric calculation always on the hot path.
> 
> So, from my point of view, commits for get/put/remove and
> commit/rollback should be reverted.
> 
> On Mon, Dec 16, 2019 at 5:39 PM Nikita Amelchev <nsamelc...@gmail.com> wrote:
>> 
>> I think these metrics are useful.
>> 
>> I have prepared PR [1] for commit and rollback histograms. [2]
>> Nikolay, could you take a look, please?
>> 
>> If you do not mind, I will try to add affinity-nodes cache metrics:
>>>> * histograms that measure the time of processing `get`, `put`, `remove`, 
>>>> `commit`, `rollback` messages on affinity nodes(primary and backups). 
>>>> Ticket doesn't exist for it.
>> 
>> I have filed a ticket for it. [3]
>> 
>> [1] https://github.com/apache/ignite/pull/7141
>> [2] https://issues.apache.org/jira/browse/IGNITE-12450
>> [3] https://issues.apache.org/jira/browse/IGNITE-12453
>> 
>> пн, 16 дек. 2019 г. в 11:07, Alexei Scherbakov 
>> <alexey.scherbak...@gmail.com>:
>>> 
>>> I think they are very useful.
>>> 
>>> пн, 16 дек. 2019 г. в 10:51, Николай Ижиков <nizhi...@apache.org>:
>>> 
>>>> Hello, Alexei.
>>>> 
>>>> Thanks for the link on the ticket, lableled it with the IEP-35 label.
>>>> What do you think about proposed metrics set?
>>>> 
>>>>> 16 дек. 2019 г., в 10:29, Alexei Scherbakov <
>>>> alexey.scherbak...@gmail.com> написал(а):
>>>>> 
>>>>> Nikolay,
>>>>> 
>>>>> What about batch operations?
>>>>> 
>>>>> For messages processing the ticket does exist and even has an
>>>>> implementation from before new metrics API times [1]
>>>>> 
>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-10418
>>>>> 
>>>>> пн, 16 дек. 2019 г. в 10:12, Николай Ижиков <nizhi...@apache.org>:
>>>>> 
>>>>>> Hello, Igniters.
>>>>>> 
>>>>>> I want to provide the user answers to the following question: "How cache
>>>>>> API operations perform?"
>>>>>> It seems, we need to implements metrics for basic cache API operations
>>>>>> like get, put, remove for it.
>>>>>> 
>>>>>> I think we should provide the following metrics:
>>>>>> 
>>>>>> * `get`, `put`, `remove` time histograms. Measured for API calls on the
>>>>>> caller node side.
>>>>>>   Implemented in [1], commit [2].
>>>>>> 
>>>>>> * `commit`, `rollback` time histograms. Measured for API calls on the
>>>>>> caller node side [3].
>>>>>> 
>>>>>> * histograms that measure the time of processing `get`, `put`, `remove`,
>>>>>> `commit`, `rollback` messages on affinity nodes(primary and backups).
>>>>>>   Ticket doesn't exist for it.
>>>>>> 
>>>>>> What do you think?
>>>>>> 
>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12219
>>>>>> [2]
>>>>>> 
>>>> https://github.com/apache/ignite/commit/e66bbef97b2cef73a533ce8a506ec479852cb364
>>>>>> [3] https://issues.apache.org/jira/browse/IGNITE-12450
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Best regards,
>>>>> Alexei Scherbakov
>>>> 
>>>> 
>>> 
>>> --
>>> 
>>> Best regards,
>>> Alexei Scherbakov
>> 
>> 
>> 
>> --
>> Best wishes,
>> Amelchev Nikita

Reply via email to