Hello, Andrey. The goal of the proposed metrics is to measure whole cache operations behavior. It provides some kind of statistics(histograms) for it. For more fine-grained analysis one will be use tracing or other «go deeper» tools.
> > Measured for API calls on the caller node side > Values will the same only for cases when node is remote relative to data Yes, metrics will evaluate API call performance. I think this is the most valuable information from a user's point of view. Regular user wants to know how fast his cache operation performs. And these metrics provide the answer. > For regular data node (server node) timing will depend on answers for > question: I think these answers are always available. I barely can imagine a scenario when one monitor «black box» cluster and don’t know it. Even so, all answers are provided through system view we brought to the Ignite :) > What is transaction commit or rollback time? These are metrics of client-side operation performance. I think a specific user has knowledge - what are his transactions. From these metrics it can answer on the question «If my transaction includes cacheXXX, how long it usually takes?» I think it’s very valuable knowledge. > It will be implemented for most types of messages. Good, let’s do it? > So, from my point of view, commits for get/put/remove and commit/rollback > should be reverted. I disagree here. If you have a better approach to measure cache operations performance - please, share your vision. > 19 дек. 2019 г., в 16:03, Andrey Gura <ag...@apache.org> написал(а): > > From my point of view, Ignite should provide meaningful metrics for > internal components that could be useful for monitoring and analysis. > All suggested options are meaningless in a sense. Below I'll try > explain why. > >> * `get`, `put`, `remove` time histograms. Measured for API calls on the >> caller node side. >> Implemented in [1], commit [2]. > > All cache operations in Ignite are distributed. So each value measured > for some cache operation will vary depending on where actually > operation is performed. Values will the same only for cases when node > is remote relative to data (e.g. client node). > > For regular data node (server node) timing will depend on answers for > question: > > - is node primary for particular key or not? (for all operations) > - how many backups configured for the cache? (for put and remove) > - what write synchronization mode is configured for particular cache? > (for put and remove) > - is readFromBackup enabled for the cache? (for get) > > Both Ignite users and Ignite developers can't make any decision based > on this metrics. > >> * `commit`, `rollback` time histograms. Measured for API calls on the caller >> node side [3]. > > What is transaction commit or rollback time? How it calculates in > Ignite now? What actions included into transaction? What actions not > related with cache executed during transactions? > > There is no any sense in time of transaction commit or rollback > because there are no any way to understand what transaction was > performed in particular period of time. Usually a lot of transactions > and we can't to distinguish from each other. > > Moreover, transaction usually treats as business operation. So only > way to measure performance properly is measure business operation > time. That is user should create own metrics set for some business > API. > > Further. What about cross cache transactions? At the moment tx > commit/rollback time will be added to corresponding metrics per each > cache evolved to the transaction. The *same time* for *each cache*. > Absolutely meaningless. > > Again, both Ignite users and Ignite developers can't make any decision > based on this metrics. But users can create own metrics set. > >> * histograms that measure the time of processing `get`, `put`, `remove`, >> `commit`, `rollback` messages on affinity nodes(primary and backups). >> Ticket doesn't exist for it. > > It will be implemented for most types of messages. > > Metrics, application monitoring, performance analysis and measurement > are a a little harder than it sounds. Therefore, we must approach this > issue more carefully. > Blindly adding new types of metrics will not only not improve the > situation, but will also worsen the overall performance of the system > because metric calculation always on the hot path. > > So, from my point of view, commits for get/put/remove and > commit/rollback should be reverted. > > On Mon, Dec 16, 2019 at 5:39 PM Nikita Amelchev <nsamelc...@gmail.com> wrote: >> >> I think these metrics are useful. >> >> I have prepared PR [1] for commit and rollback histograms. [2] >> Nikolay, could you take a look, please? >> >> If you do not mind, I will try to add affinity-nodes cache metrics: >>>> * histograms that measure the time of processing `get`, `put`, `remove`, >>>> `commit`, `rollback` messages on affinity nodes(primary and backups). >>>> Ticket doesn't exist for it. >> >> I have filed a ticket for it. [3] >> >> [1] https://github.com/apache/ignite/pull/7141 >> [2] https://issues.apache.org/jira/browse/IGNITE-12450 >> [3] https://issues.apache.org/jira/browse/IGNITE-12453 >> >> пн, 16 дек. 2019 г. в 11:07, Alexei Scherbakov >> <alexey.scherbak...@gmail.com>: >>> >>> I think they are very useful. >>> >>> пн, 16 дек. 2019 г. в 10:51, Николай Ижиков <nizhi...@apache.org>: >>> >>>> Hello, Alexei. >>>> >>>> Thanks for the link on the ticket, lableled it with the IEP-35 label. >>>> What do you think about proposed metrics set? >>>> >>>>> 16 дек. 2019 г., в 10:29, Alexei Scherbakov < >>>> alexey.scherbak...@gmail.com> написал(а): >>>>> >>>>> Nikolay, >>>>> >>>>> What about batch operations? >>>>> >>>>> For messages processing the ticket does exist and even has an >>>>> implementation from before new metrics API times [1] >>>>> >>>>> [1] https://issues.apache.org/jira/browse/IGNITE-10418 >>>>> >>>>> пн, 16 дек. 2019 г. в 10:12, Николай Ижиков <nizhi...@apache.org>: >>>>> >>>>>> Hello, Igniters. >>>>>> >>>>>> I want to provide the user answers to the following question: "How cache >>>>>> API operations perform?" >>>>>> It seems, we need to implements metrics for basic cache API operations >>>>>> like get, put, remove for it. >>>>>> >>>>>> I think we should provide the following metrics: >>>>>> >>>>>> * `get`, `put`, `remove` time histograms. Measured for API calls on the >>>>>> caller node side. >>>>>> Implemented in [1], commit [2]. >>>>>> >>>>>> * `commit`, `rollback` time histograms. Measured for API calls on the >>>>>> caller node side [3]. >>>>>> >>>>>> * histograms that measure the time of processing `get`, `put`, `remove`, >>>>>> `commit`, `rollback` messages on affinity nodes(primary and backups). >>>>>> Ticket doesn't exist for it. >>>>>> >>>>>> What do you think? >>>>>> >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12219 >>>>>> [2] >>>>>> >>>> https://github.com/apache/ignite/commit/e66bbef97b2cef73a533ce8a506ec479852cb364 >>>>>> [3] https://issues.apache.org/jira/browse/IGNITE-12450 >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Best regards, >>>>> Alexei Scherbakov >>>> >>>> >>> >>> -- >>> >>> Best regards, >>> Alexei Scherbakov >> >> >> >> -- >> Best wishes, >> Amelchev Nikita