This is a tough topic, however, in my opinion, static bucket configuration for histograms will work worse than no configuration at all. The reason is simple - complexity. If we assume a need to change bucket configuration for some metric, we can assume that this metric can be related to a cache. Now, we can assume that different caches require different metric configuration. Combined with the ability to start dynamic caches, we get no simple way to change the metric for dynamically started caches.
>From my experience, properly chosen buckets are usually good enough to troubleshoot most of the cases. For advanced scenarios, I would delay the configuration exposure and designed a properly structured control.sh commands which will utilize either metastore or whatever another system we choose to keep this configuration. Managing configuration files in large clusters is a big pain, no need to make it worse. вт, 6 авг. 2019 г. в 15:32, Павлухин Иван <vololo...@gmail.com>: > Andrey, > > It seems that screenshot was rejected, I do not see it. > > вт, 6 авг. 2019 г. в 15:07, Andrey Gura <ag...@apache.org>: > > > > Good example of proper buckets configuration. Such configuration is > > suitable for may cases. See attached screenshot (I hope it will not be > > reject by mail system or forum). > > > > > > On Tue, Aug 6, 2019 at 2:45 PM Andrey Gura <ag...@apache.org> wrote: > > > > > > > What do you mean by "exponential bounds"? > > > > > > Something like this if we talk about latency in ms for example: 5, 10, > > > 25, 50, 100, 200, 500, ... > > > > > > > Thanks, for the feedback, appreciate you ownesty. > > > > > > Nothing personal. It is just about functionality from user's stand > point. > > > > > > > What is your proposal? > > > > How metrics configuration should work? > > > > > > My proposal is simple: just drop this change. We don't need the > > > configuration. Metric owner (developer) defines buckets' bounds for > > > each particular case (it could be done uniformly or exponentially, it > > > depends on metric and problem definition). > > > > > > On Mon, Aug 5, 2019 at 6:36 PM Nikolay Izhikov <nizhi...@apache.org> > wrote: > > > > > > > > Hello, Andrey. > > > > > > > > > Not necessary if we have exponential bounds' values for histograms. > > > > > > > > What do you mean by "exponential bounds"? > > > > > > > > > Anyway, in current solution it looks ugly and not usable. > > > > > > > > Thanks, for the feedback, appreciate you ownesty. > > > > > > > > > No. But we should admit that this is bad decision and do not > include this change to the code base. > > > > > > > > What is your proposal? > > > > How metrics configuration should work? > > > > > > > > > Yes. But it still will not give enough accuracy. > > > > > > > > Enough for what? > > > > > > > > В Пн, 05/08/2019 в 18:29 +0300, Andrey Gura пишет: > > > > > > > - metric configuration is node local (not cluster wide). > > > > > > This issue is easy to solve on the user-side and in Ignite core. > > > > > > > > > > It's imaginary simplicity. The first, you need some additional > > > > > automation on user-side in order to configure all nodes of the > > > > > cluster. The second, new nodes can join to the cluster and > > > > > configuration will be different on new node and on other nodes of > the > > > > > cluster. This leads to complication whole functionality. Anyway, I > > > > > don't like such simplified solution because at the moment it brings > > > > > more problems than value. > > > > > > > > > > > The easiest solution was implemented. > > > > > > Do we want to make it more complex right now :)? > > > > > > > > > > No. But we should admit that this is bad decision and do not > include > > > > > this change to the code base. > > > > > > > > > > > The reason it exists in PR - we already have this parameter in > DataStorageConfiguration#getMetricsSubIntervalCount > > > > > > > > > > I believe this method should be deprecated and removed in major > release. > > > > > > > > > > > I think the user should be able to configure buckets for > histogram and rateTimeInterval for hitrate. > > > > > > > > > > Not necessary if we have exponential bounds' values for histograms. > > > > > Anyway, in current solution it looks ugly and not usable. > > > > > > > > > > > Ignite has dozens of use-cases and deployment modes, seems, > > > > > > we can't cover it all with the single predefined > buckets/rateTimeInterval set. > > > > > > > > > > Yes. But it still will not give enough accuracy. > > > > > > > > > > On Mon, Aug 5, 2019 at 5:25 PM Nikolay Izhikov < > nizhi...@apache.org> wrote: > > > > > > > > > > > > Hello, Andrey. > > > > > > > > > > > > > - metric configuration is node local (not cluster wide). > > > > > > > > > > > > This issue is easy to solve on the user-side and in Ignite core. > > > > > > > > > > > > > - metric configuration doesn't survive node restart. > > > > > > > > > > > > We decide to go with the simplest solution, for now. > > > > > > The easiest solution was implemented. > > > > > > Do we want to make it more complex right now :)? > > > > > > > > > > > > > - User shouldn't configure hit rate metrics at runtime in most > cases. > > > > > > > > > > > > I agree with you - the size of the counters array looks odd as a > configuration parameter. > > > > > > The reason it exists in PR - we already have this parameter in > DataStorageConfiguration#getMetricsSubIntervalCount > > > > > > > > > > > > > - May be it is enough for user to have histograms with > pre-configured buckets > > > > > > > So I think we should drop this change and idea about runtime > histrogram and hit rate configuration. > > > > > > > > > > > > I think the user should be able to configure buckets for > histogram and rateTimeInterval for hitrate. > > > > > > > > > > > > Ignite has dozens of use-cases and deployment modes, seems, > > > > > > we can't cover it all with the single predefined > buckets/rateTimeInterval set. > > > > > > > > > > > > В Пн, 05/08/2019 в 16:59 +0300, Andrey Gura пишет: > > > > > > > Igniters, > > > > > > > > > > > > > > I've took a look to the PR and I want follow up this > discussion again. > > > > > > > > > > > > > > Proposed solution has a couple of significant drawbacks: > > > > > > > > > > > > > > - metric configuration is node local (not cluster wide). > > > > > > > - metric configuration doesn't survive node restart. > > > > > > > > > > > > > > This drawbacks make configuration complex, annoying and > useless in most cases. > > > > > > > > > > > > > > Moreover, I think that: > > > > > > > > > > > > > > - User shouldn't configure hit rate metrics at runtime in most > cases. > > > > > > > Especially HitRateMetric.size because it's just details of > > > > > > > implementation. Purpose of size is plots smoothing and this > parameter > > > > > > > could be fixed (e.g. 16 is enough). HitRate metric is just > LongMetric > > > > > > > but with additional feature. > > > > > > > - May be it is enough for user to have histograms with > pre-configured > > > > > > > buckets. The trick here is properly chosen bounds. It seems > that > > > > > > > exponentially chosen values will fit for most cases. So we can > avoid > > > > > > > runtime configuration for histograms. > > > > > > > - We can also provide percentile metric for more accurate > > > > > > > measurements. Yes, it will bring additional performance impact > and > > > > > > > accuracy will not be the best. But it will more clearly and > will not > > > > > > > require configuration. > > > > > > > > > > > > > > So I think we should drop this change and idea about runtime > > > > > > > histrogram and hit rate configuration. > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > On Tue, Jul 9, 2019 at 2:06 PM Nikolay Izhikov < > nizhi...@apache.org> wrote: > > > > > > > > > > > > > > > > Igniters, > > > > > > > > > > > > > > > > I made a PR for metrics configuration. > > > > > > > > > > > > > > > > Please, review > > > > > > > > > > > > > > > > https://github.com/apache/ignite/pull/6676/files > > > > > > > > > > > > > > > > В Вт, 09/07/2019 в 12:27 +0300, Nikolay Izhikov пишет: > > > > > > > > > Hello, Alex. > > > > > > > > > > > > > > > > > > OK, Let's go with the simplest solution. > > > > > > > > > I will provide API and JMX method for metrics > configuration shortly. > > > > > > > > > > > > > > > > > > В Пн, 08/07/2019 в 18:23 +0300, Alexey Goncharuk пишет: > > > > > > > > > > Nikolay, > > > > > > > > > > > > > > > > > > > > To me a separate metrics configuration file seems to be > not very > > > > > > > > > > user-friendly. First of all, it does not allow to > configure the system only > > > > > > > > > > from Java code. Second, having multiple configuration > files seem to be > > > > > > > > > > quite confusing for end users (judging by the logging > configuration > > > > > > > > > > questions). > > > > > > > > > > > > > > > > > > > > Perhaps, we will still end up with the configuration > file - but for now I > > > > > > > > > > would put this aside for a more thorough brainstorm and > added the JMX and > > > > > > > > > > internal API for changing metrics configuration. > > > > > > > > > > > > > > > > > > > > пт, 5 июл. 2019 г. в 14:17, Seliverstov Igor < > gvvinbl...@gmail.com>: > > > > > > > > > > > > > > > > > > > > > Igniters, > > > > > > > > > > > > > > > > > > > > > > One more question on topic. > > > > > > > > > > > > > > > > > > > > > > Should we preserve metrics configuration on restart? > (I think we should) > > > > > > > > > > > > > > > > > > > > > > If so, which configuration use after restart? Defined > in config file or > > > > > > > > > > > saved in config storage? (I guess, saved configuration > should have a > > > > > > > > > > > priority) > > > > > > > > > > > > > > > > > > > > > > So, how to tell users that any changes in > configuration file have no > > > > > > > > > > > effect on Ignite configuration after first start? > > > > > > > > > > > > > > > > > > > > > > I think there are too many open questions and (at > least at now) we should > > > > > > > > > > > provide only JMX API until all of the questions are > clarified. > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > Igor > > > > > > > > > > > > > > > > > > > > > > > 4 июля 2019 г., в 19:55, Nikolay Izhikov < > nizhi...@apache.org> > > > > > > > > > > > > > > > > > > > > > > написал(а): > > > > > > > > > > > > > > > > > > > > > > > > Hello, Andrey. > > > > > > > > > > > > > > > > > > > > > > > > > 3. I can't imagine that adequate values will be > chosen on project > > > > > > > > > > > > > setup stage. > > > > > > > > > > > > > > > > > > > > > > > > Configuration file required in the case we adds new > node or replace > > > > > > > > > > > > > > > > > > > > > > existing to the cluster. > > > > > > > > > > > > Use can have parameters similar to Ignite > configuration, log > > > > > > > > > > > > > > > > > > > > > > configuration files. > > > > > > > > > > > > > > > > > > > > > > > > > My proposal is adding API for boundaries > configuration to the metrics > > > > > > > > > > > > > framework and expose it via JMX > > > > > > > > > > > > > > > > > > > > > > > > Agree. I think we should have both: > > > > > > > > > > > > > > > > > > > > > > > > 1. Configuration file. > > > > > > > > > > > > 2. JMX API to change bounaries of histogram *and > HitRateMetric params*. > > > > > > > > > > > > > > > > > > > > > > > > But, if you and other community member are against > config file, let's > > > > > > > > > > > > > > > > > > > > > > have only JMX. > > > > > > > > > > > > Seems, JMX will provide required level of > configurability for metrics. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > В Чт, 04/07/2019 в 17:53 +0300, Andrey Gura пишет: > > > > > > > > > > > > > Igniters, > > > > > > > > > > > > > > > > > > > > > > > > > > I rethought the issue and I see some problems: > > > > > > > > > > > > > > > > > > > > > > > > > > 1. It seems that in most cases bucket boundaries > configuration will be > > > > > > > > > > > > > problem for user. Absolute values for latency > boundaries it is very > > > > > > > > > > > > > odd choice. > > > > > > > > > > > > > 2. Also seems that latency for most caches (if we > configure cache > > > > > > > > > > > > > metrics fro example) will be similar. > > > > > > > > > > > > > 3. I can't imagine that adequate values will be > chosen on project > > > > > > > > > > > > > setup stage. So chosen values should be changed in > the future. > > > > > > > > > > > > > > > > > > > > > > > > > > Solution with configuration file looks unnatural > and creates more > > > > > > > > > > > > > problems than could solve. > > > > > > > > > > > > > > > > > > > > > > > > > > My proposal is adding API for boundaries > configuration to the metrics > > > > > > > > > > > > > framework and expose it via JMX (at this step). It > still provides > > > > > > > > > > > > > configuration possibility but don't force user to > do it. > > > > > > > > > > > > > > > > > > > > > > > > > > Also we should chose default values for bucket > boundaries. And it is > > > > > > > > > > > > > most complex problem at the moment :) Let's > discuss it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 3, 2019 at 4:49 PM Andrey Gura < > ag...@apache.org> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Nikolai, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Metric is disabled if it doesn't allocate any > memory and doesn't > > > > > > > > > > > > > > update any variable because doesn't have any > value. Ideally disabling > > > > > > > > > > > > > > metrics for some cache should be equal to cache > stopping. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 28, 2019 at 1:02 PM Nikolay Izhikov < > nizhi...@apache.org> > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello, Alexey. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My only concert is that we should have the > metrics framework > > > > > > > > > > > > > > > > > > > > > > configuration > > > > > > > > > > > > > > > > as the first-citizen of the framework itself > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes. I planned to add `void configure(String > param)` method to the > > > > > > > > > > > > > > > > > > > > > > metric API. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > but change the metrics parameters in > > > > > > > > > > > > > > > > runtime from JMX or command-line, etc. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've add requirement of JMX method to the > ticket: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-11927 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Another concern is to have an > > > > > > > > > > > > > > > > ability to disable/enable metrics per > metrics group/prefix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, we discusss it. > > > > > > > > > > > > > > > But, let's make it clear: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *What is disabling metric?* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looks like exporter filter solve this task. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > В Чт, 27/06/2019 в 16:24 +0300, Alexey > Goncharuk пишет: > > > > > > > > > > > > > > > > Nikolay, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My only concert is that we should have the > metrics framework > > > > > > > > > > > > > > > > > > > > > > configuration > > > > > > > > > > > > > > > > as the first-citizen of the framework > itself. This way, we can > > > > > > > > > > > > > > > > > > > > > > configure > > > > > > > > > > > > > > > > the metrics not only from file, but change > the metrics parameters in > > > > > > > > > > > > > > > > runtime from JMX or command-line, etc. > Another concern is to have an > > > > > > > > > > > > > > > > ability to disable/enable metrics per > metrics group/prefix. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The logger-like configuration meets these > suggestions given that the > > > > > > > > > > > > > > > > configuration is generalized into the > metrics framework. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > чт, 27 июн. 2019 г. в 12:30, Nikolay Izhikov > <nizhi...@apache.org>: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello, Igniters. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As you may know, I've contributed Phase1 > [1] for IEP-35 [2]. > > > > > > > > > > > > > > > > > Now we have metrics subsystem and can > create and export any metrics > > > > > > > > > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > > Ignite. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think user(administrator of Ignite) > should be able to configure > > > > > > > > > > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > > metrics params in a common way [3] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I propose to use the same way from logging > frameworks. > > > > > > > > > > > > > > > > > We should define some file format Ignite > can understand. > > > > > > > > > > > > > > > > > An administrator fills configuration file > to configure one or > > > > > > > > > > > > > > > > > > > > > > several > > > > > > > > > > > > > > > > > metrics. > > > > > > > > > > > > > > > > > Ignite will analyze the file and use > provided params during metrics > > > > > > > > > > > > > > > > > creation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For now, we have 2 types of metrics that > should be configured: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * HistrogramMetric [4] > > > > > > > > > > > > > > > > > This metric is a count of > measurement that falls into > > > > > > > > > > > > > > > > > predefined intervals. > > > > > > > > > > > > > > > > > An example is "Request > processing time distribution". > > > > > > > > > > > > > > > > > We want to calculate a > count of requests processed > > > > > > > > > > > > > > > > > > > > > > quicker > > > > > > > > > > > > > > > > > then 50ms, 50-100, 100-250, 250-500 and > slower. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * HitRateMetric [5] > > > > > > > > > > > > > > > > > This metric is a count of > events in the last time > > > > > > > > > > > > > > > > > > > > > > interval. > > > > > > > > > > > > > > > > > An example is the "Count of > requests processed in > > > > > > > > > > > > > > > > > > > > > > the last > > > > > > > > > > > > > > > > > 5 seconds". > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Example of file content: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ```` > > > > > > > > > > > > > > > > > cache.my-cahe.GetLatency=50,100,250,500 > #Params for the histogram > > > > > > > > > > > > > > > > > > > > > > metric > > > > > > > > > > > > > > > > > with the name `cache.my-cahe.get` > > > > > > > > > > > > > > > > > cache.my-cache.RebalancingKeysRate=60000 > #Param for existing > > > > > > > > > > > > > > > > > > > > > > HitRateMetric > > > > > > > > > > > > > > > > > that hold "Estimated rebalancing speed in > keys". > > > > > > > > > > > > > > > > > ```` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please, share your vision. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/ignite/commit/fdaa310430aefff07994eb35510d3416886b5bbe > > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392 > > > > > > > > > > > > > > > > > [3] > https://issues.apache.org/jira/browse/IGNITE-11927 > > > > > > > > > > > > > > > > > [4] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/metric/impl/HistogramMetric.java > > > > > > > > > > > > > > > > > [5] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/metric/impl/HitRateMetric.java > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Best regards, > Ivan Pavlukhin >