Nikolay, thanks a lot for clarification! I added some comments to Upsource review [1].
Here I want to discuss some high-level issues. 1. Naming "There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton I really don't like name MonitoringList. First of all because it isn't about monitoring at all while can be useful for monitoring purposes. We already have SQL system views and I think that system view is good candidate for naming of new entity. As result we will have consistent naming which better describes domain. I think akso that GridMetricManager is bad candidate for lists (system views) management. Because it isn't about metrics. May be new SystemViewManager will better fit to this purposes. 2. Management Lists (aka system views) have life cycle now. I believe that it is redundant functionality. There is no any reason for enabling/disabling lists. There is no any interaction with lists on hot path of code flow and there is no any performance impact. So lists management can be reduced to lists creation and registration operations (which executes only on node start). 3. Code generation Code generation for walkers is also redundant. Amount of system views in the system is strongly limited (units not dozens) so it is easier to change walker by hand literally than navigate to code generator and run it. Moreover, first you should add Order annotation in the proper place and it make generator practically useless. If you still see benefit that can bring Order annotation you can use reflection. Motivation is simple, system views are on not hot path and I expected that API for system views will not called frequently. 4. Export I really don't understand why we should export system views content (especially periodically). Real life use case is take view content on demand. So we should have public API for it, SQL API and JMX. There is no need any exporters. What do you think about it? Also it would be great to involve more people to this discussion. [1] https://reviews.ignite.apache.org/ignite/review/IGNT-CR-1065 On Wed, Sep 11, 2019 at 6:24 PM Nikolay Izhikov <nizhi...@apache.org> wrote: > > Hello, Andrey. > > Thanks, for joining the review. > > Basic interface for objects list is `MonitoringList`. It provides the > following features: > * name. > * description. > * row class. > * size. > * iterator for the list content. > * attribute walker (described below). > > `MonitoringRow` is a marker interface for classes that can be used as a > monitoring list content. > > Internally, there is only one implementation of `MonitoringList`, for now, > `MonitoringListAdapter`. > It adapts the content of some `ConcurrentMap` which uses widely in Ignite > internals. > I think, will be another implementation in the follow-up PRs. > > Public API changes: > > * New registry created `ReadOnlyMonitoringListRegistry` It provides access: > * To all lists that exist in the Ignite. > * Ability to subscribe to the list creation/removal events. > > * `MetricExporterSpi` changes: > * `setMonitoringListRegistry` method added > * `setMonitoringListExportFilter` method added. > > `MonitoringRowAttributeWalker` is a helper class for exporter implementations. > Usually, exporter SPI iterates on `MonitoringRow` attributes. > `SqlViewExporterSpi`, `JmxMetricExporterSpi` can be taken as an example. > It can be implemented with Java reflection API, but I use more quick approach. > `MonitoringRowAttributeWalker` can visit each attribute of the MonitoringRow > implementation. > It's also, preserves, the order provided by the MonitoringRow implementation > author. > It provides 2 main methods: > * `visitAll(AttributeVisitor visitor);` - visits each attribute of > the some monitoring row class. Provides index, name and class of attribute to > the consumer. > * `visitAll(R row, AttributeWithValueVisitor visitor)` - visits each > attribute of some monitoring row instance. Provides index, name, class, value > of attribute to the consumer. > > > В Ср, 11/09/2019 в 16:30 +0300, Andrey Gura пишет: > > Nikolai, > > > > I'm trying to review this PR but it is too large. > > > > Could you please describe problem and design of implemented solution? > > Also javadocs for base interfaces aren't clear, too brief and doesn't > > give any imagine about whole picture. > > > > At present it is very hard to understand the purposes of new > > interfaces and walker generator, and design itself. > > > > On Fri, Sep 6, 2019 at 3:16 PM Nikolay Izhikov <nizhi...@apache.org> wrote: > > > > > > Hello, Igniters. > > > > > > IEP-35. Monitoring&Profiling. Phase2 is ready [1] > > > Please, join to the review! > > > > > > I've implemented: > > > > > > * Monitoring list engine. > > > * Following list implemented: > > > * Cache list > > > * Cache group list > > > * Compute task list > > > * Service list. > > > > > > Engine details: > > > > > > * `MonitoringList` added to store list data. > > > * Base interface `MonitoringRow` for list data created. > > > * Corresponding method added to `MetricExporterSpi` > > > * `JmxMetricExporterSpi`, `SqlViewExporterSpi`, `LogExporterSpi` updated > > > to > > > support list export. > > > * JMX, SQL and other column-oriented SPI uses > > > `MonitoringRowAttributeWalker` to quickly traverse all list row > > > attributes. > > > * Implementation of `MonitoringRowAttributeWalkerfor > > > specificMonitoringRow` > > > can be generated with `MonitoringRowAttributeWalkerGenerator` > > > > > > I prepare follow-up PR [2], also. > > > Following lists implemented: > > > > > > * SQL tables > > > * SQL indexes > > > * SQL schemas > > > * SQL queries > > > * Continuous queries > > > * Text queries > > > * Transactions > > > * Cluster nodes > > > * Client connections(JDBC, ODBC, Thin) > > > > > > [1] https://github.com/apache/ignite/pull/6845 > > > [2] https://github.com/apache/ignite/pull/6790 > > > > > > > > > > > > пн, 10 июн. 2019 г. в 13:49, Nikolay Izhikov <nizhi...@apache.org>: > > > > > > > Hello, Igniters. > > > > > > > > Since Phase 1 will be merged in master soon I've created the ticket [1] > > > > for Phase 2. > > > > > > > > Scope of Phase 2(copy-paste from the ticket) > > > > > > > > Ability to collect lists of some internal object Ignite manage. > > > > Examples of such objects: > > > > > > > > * Caches > > > > * Queries (including continuous queries) > > > > * Services > > > > * Compute tasks > > > > * Distributed Data Structures > > > > * etc... > > > > > > > > > > > > 1. Fields for each list(that doesn't currently exists in Ignite) will be > > > > discussed in separate tickets > > > > 2. Metric Exporters (optionally) can support list export. > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-11905 > > > > > > > > > > > > В Вт, 14/05/2019 в 16:42 +0300, Nikolay Izhikov пишет: > > > > > Ticket for IEP.Phase1 created - > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-11848 > > > > > > > > > > > > > > > В Пн, 13/05/2019 в 18:06 +0300, Nikolay Izhikov пишет: > > > > > > Hello, Igniters. > > > > > > > > > > > > We have discussed this IEP [1] with Alexey Goncharyuk, Anton > > > > > > > > Vinogradov, Andrey Gura, Alexey Scherbakov and Pavel Kovalenko. > > > > > > > > > > > > Issues to address: > > > > > > > > > > > > 1. Study experience of following libs, tools: > > > > > > * OpenTracing > > > > > > * OpenSensus > > > > > > * DropWizard > > > > > > > > > > > > 2. Support histogram sensor: Sensor that collects values that gets > > > > > > > > into predefined segments > > > > > > > > > > > > 3. Use more widely used naming(like in OpenSensus?) > > > > > > > > > > > > 4. Consider the usage of OpenSensus as a default implementation for > > > > > > > > local metric storage. > > > > > > > > > > > > 5. To measure the performance penalty for metrics for 5_000 caches. > > > > > > > > > > > > 6. Some metrics should be part of public API and others are not(may > > > > > > be > > > > > > > > changed/removed in release without warnings). > > > > > > > > > > > > My plan for Phase #1 is the following: > > > > > > > > > > > > 1. Address the issues. > > > > > > 2. Prepare public API > > > > > > 3. Prepare PR for monitoring subsystem + existing metrics rewritten > > > > > > > > with it. > > > > > > 4. Prepare a PR with lists of each user API. > > > > > > 5. Collect feedback for a #4. > > > > > > 6. Design a log exposer. Consider the usage of JFR format or some > > > > > > > > other widely used, tool compatible format. > > > > > > > > > > > > [1] > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392 > > > > > > > > > > > > В Чт, 02/05/2019 в 14:02 +0300, Nikolay Izhikov пишет: > > > > > > > Hello, Maxim. > > > > > > > > > > > > > > > How will be recorded throughput sensor values which will require > > > > > > > > an interval for the rate calculations? > > > > > > > > > > > > > > I answered to this question in IEP "Design principles": > > > > > > > > > > > > > > ``` > > > > > > > Sensors should contain only raw values. No aggregation of numeric > > > > > > > > metrics on Ignite side. > > > > > > > Min, max, avg and other functions are the matter of an external > > > > > > > > monitoring system. > > > > > > > ``` > > > > > > > > > > > > > > Throughput is a function `(S(t2) - S(t1))/(t2-t1)` > > > > > > > where S(t) is the sensor value in some point of time t. > > > > > > > > > > > > > > Seems, throughput calculation is a responsibility of an external > > > > > > > > system. > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > It seems to me that we can add an additional parameter of > > > > > > > > `sensitivityLevel` to provide for the user a flexible sensor control > > > > (e.g., > > > > INFO, WARN, NOTICE, DEBUG). > > > > > > > > > > > > > > For now, I think that all sensors and lists will be very(very!) > > > > > > > > lightweight. > > > > > > > So, we should be able to disable/enable it's, for sure. > > > > > > > > > > > > > > But, we should turn off and turn on the whole Ignite subsystem > > > > > > > for the case we have strong performance limitations for a > > > > > > > particular > > > > > > > > workload. > > > > > > > > > > > > > > So, we have two "level" of monitoring - INFO and DEBUG(for > > > > > > > > profiling: IEP-35 - Phase 3). > > > > > > > For example, AFAIK we can't disable current SQL system views(Why > > > > > > > > should we?) > > > > > > > > > > > > > > В Вт, 30/04/2019 в 14:33 +0300, Maxim Muzafarov пишет: > > > > > > > > Hello Nikolay, > > > > > > > > > > > > > > > > I've looked through your PRs changes. > > > > > > > > > > > > > > > > > Sensors > > > > > > > > > > > > > > > > How will be recorded throughput sensor values which will > > > > > > > > require an > > > > > > > > interval for the rate calculations? Do we have such an example? > > > > > > > > For > > > > > > > > instance, getAllocationRate() or getEvictionRate(). These > > > > > > > > metrics > > > > > > > > are > > > > > > > > out of the scope of current PoC and IEP as they are not related > > > > > > > > to > > > > > > > > the > > > > > > > > user metrics, but it is a good example of a particular metric > > > > > > > > type. > > > > > > > > > > > > > > > > It seems to me that we can add an additional parameter of > > > > > > > > `sensitivityLevel` to provide for the user a flexible sensor > > > > > > > > control > > > > > > > > (e.g., INFO, WARN, NOTICE, DEBUG). > > > > > > > > > > > > > > > > It also seems that for the sensors getValue() the completely > > > > > > > > functional java approach can be used. Am I right? > > > > > > > > > > > > > > > > On Mon, 29 Apr 2019 at 11:44, Nikolay Izhikov > > > > > > > > <nizhi...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hello, Vyacheslav. > > > > > > > > > > > > > > > > > > Thanks for the feedback! > > > > > > > > > > > > > > > > > > > HttpExposer with Jetty's dependencies should be detached> > > > > > > > > > > from > > > > > > > > the core module. > > > > > > > > > > > > > > > > > > Agreed. module hierarchy is the essence of the next steps. > > > > > > > > > For now it just a proof of my ideas for Ignite monitoring we > > > > > > > > > can > > > > > > > > discuss. > > > > > > > > > > > > > > > > > > > I like your approach with 'wrapper' for monitored objects, > > > > > > > > like don't like using 'ServiceConfiguration' directly as a monitored > > > > object > > > > for services > > > > > > > > > > > > > > > > > > Agreed in general. > > > > > > > > > Seems, choosing the right data to expose is the matter of > > > > > > > > separate discussion for each Ignite entities. > > > > > > > > > I've planned to file tickets for each entity so anyone > > > > > > > > interested can share his vision in it. > > > > > > > > > > > > > > > > > > > In my opinion, each sensor should have a timestamp. > > > > > > > > > > > > > > > > > > I'm not sure that *every* sensor should have directly > > > > > > > > > associated > > > > > > > > timestamp. > > > > > > > > > Seems, we should support sensors without timestamp for a > > > > > > > > > current > > > > > > > > monitoring numbers at least. > > > > > > > > > > > > > > > > > > > Also, it'd be great to have an ability to store a list of a > > > > > > > > fixed size> of last N sensors > > > > > > > > > > > > > > > > > > What use-cases do you know for such sensors? > > > > > > > > > We have plans to support fixed size lists to show "Last N SQL > > > > > > > > queries" or similar data. > > > > > > > > > Essentially, a sensor is just a single value with the name and > > > > > > > > known meaning. > > > > > > > > > > > > > > > > > > > It'd be great if you provide a more extended test to show > > > > > > > > > > the > > > > > > > > work of> the system. > > > > > > > > > > > > > > > > > > Sorry, for that :) > > > > > > > > > When you run 'MonitoringSelfTest' you should open > > > > > > > > http://localhost:8080/ignite/monitoring to view exposed info. > > > > > > > > > I provide this info in gist - > > > > > > > > https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d > > > > > > > > > > > > > > > > > > I will extend this test to print results to console in the > > > > > > > > > next > > > > > > > > iterations - stay tuned :) > > > > > > > > > > > > > > > > > > В Вс, 28/04/2019 в 23:35 +0300, Vyacheslav Daradur пишет: > > > > > > > > > > Hi, Nikolay, > > > > > > > > > > > > > > > > > > > > I looked through PR and IEP, and I have some comments: > > > > > > > > > > > > > > > > > > > > It would be better to implement it as a separate module, I > > > > > > > > can't say > > > > > > > > > > if it is possible for the main part of monitoring or not, > > > > > > > > > > but I > > > > > > > > > > believe that HttpExposer with Jetty's dependencies should be > > > > > > > > detached > > > > > > > > > > from the core module. > > > > > > > > > > > > > > > > > > > > I like your approach with 'wrapper' for monitored objects, > > > > > > > > > > like > > > > > > > > > > 'ComputeTaskInfo' in PR, and don't like using > > > > > > > > 'ServiceConfiguration' > > > > > > > > > > directly as a monitored object for services. I believe we > > > > > > > > shouldn't > > > > > > > > > > mix approaches. It'd be better always use some kind of > > > > > > > > container with > > > > > > > > > > monitored object's information to work with such data. > > > > > > > > > > > > > > > > > > > > In my opinion, each sensor should have a timestamp. Usually > > > > > > > > monitoring > > > > > > > > > > systems aggregate data and build graphics according to > > > > > > > > > > sensors > > > > > > > > > > timestamp. > > > > > > > > > > > > > > > > > > > > Also, it'd be great to have an ability to store a list of a > > > > > > > > fixed size > > > > > > > > > > of last N sensors, not to miss them without pushing to an > > > > > > > > external > > > > > > > > > > monitoring system. > > > > > > > > > > > > > > > > > > > > It'd be great if you provide a more extended test to show > > > > > > > > > > the > > > > > > > > work of > > > > > > > > > > the system. Everybody who looks to PR needs to run the test > > > > > > > > and get > > > > > > > > > > the info manually to see the completeness of sensors, this > > > > > > > > might be > > > > > > > > > > simplified by proper test. > > > > > > > > > > > > > > > > > > > > Thank you! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 26, 2019 at 5:56 PM Nikolay Izhikov < > > > > > > > > nizhi...@apache.org> wrote: > > > > > > > > > > > > > > > > > > > > > > Hello, Igniters. > > > > > > > > > > > > > > > > > > > > > > I've prepared Proof of Concept for IEP-35 [1] > > > > > > > > > > > PR can be found here - > > > > > > > > https://github.com/apache/ignite/pull/6510 > > > > > > > > > > > > > > > > > > > > > > I've done following changes: > > > > > > > > > > > > > > > > > > > > > > 1. `GridMonitoringManager` [2] - simple > > > > > > > > implementation of manager to store all monitoring info > > > > > > > > > > > 2. `HttpPullExposerSpi` [3] - pull exposer > > > > > > > > implementation that can respond with JSON from > > > > http://localhost:8080/ignite/monitoring. JSON content can be veiwed in > > > > gist [4] > > > > > > > > > > > 3. Compute task start and finish monitoring in > > > > > > > > "compute" list [5] > > > > > > > > > > > 4. Service registration are monitored in "service" > > > > > > > > list - [6] > > > > > > > > > > > 5. Current `IgniteSpiMBeanAdapter` rewritten using > > > > > > > > `GridMonitoringManager` [7] > > > > > > > > > > > > > > > > > > > > > > Design principles, monitoring subsystem details and new > > > > > > > > Ignite entities can be found in IEP [1]. > > > > > > > > > > > > > > > > > > > > > > My next steps will be: > > > > > > > > > > > > > > > > > > > > > > 1. Implementation of JMX exposer > > > > > > > > > > > 2. Registration of all "lists" and "sensor groups" > > > > > > > > as a SQL System view. > > > > > > > > > > > 3. Add monitoring for all unmonitoring Ignite API. > > > > > > > > (described in IEP). > > > > > > > > > > > 4. Rewrite existing jmx metrics using > > > > > > > > GridMonitoringManager. > > > > > > > > > > > > > > > > > > > > > > Please, share you thoughts. > > > > > > > > > > > > > > > > > > > > > > Part of JSON file: > > > > > > > > > > > ``` > > > > > > > > > > > "COMPUTE": { > > > > > > > > > > > "tasks": { > > > > > > > > > > > "name": "tasks", > > > > > > > > > > > "rows": [ > > > > > > > > > > > { > > > > > > > > > > > "id": "0798817a-eeec-4386-9af7-94edb39ffced", > > > > > > > > > > > "sessionId": > > > > > > > > "a1814f95a61-912451ff-ca7b-4764-a7fd-728f6a900000", > > > > > > > > > > > "data": { > > > > > > > > > > > "taskClasName": > > > > > > > > "org.apache.ignite.monitoring.MonitoringSelfTest$$Lambda$145/1500885480", > > > > > > > > > > > "startTime": 1556287337944, > > > > > > > > > > > "timeout": 9223372036854776000, > > > > > > > > > > > "execName": null > > > > > > > > > > > }, > > > > > > > > > > > "name": "anotherBroadcast" > > > > > > > > > > > } > > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392 > > > > > > > > > > > [2] > > > > > > > > https://github.com/apache/ignite/pull/6510/files#diff-ec7d5cf5e35b99303deb9accee153c50R34 > > > > > > > > > > > [3] > > > > > > > > https://github.com/apache/ignite/pull/6510/files#diff-32239c45e0ae3b692af2eae7078e1436R47 > > > > > > > > > > > [4] > > > > > > > > https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d > > > > > > > > > > > [5] > > > > > > > > https://github.com/apache/ignite/pull/6510/files#diff-d651ed29d07bd0c5ce291654a3254cc0R749 > > > > > > > > > > > [6] > > > > > > > > https://github.com/apache/ignite/pull/6510/files#diff-0b4e54fbda2b0da1c10eff48416336f6R1606 > > > > > > > > > > > [7] > > > > > > > > https://github.com/apache/ignite/pull/6510/files#diff-4398bf118150500e059069b3a1638ec7R61 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >