Hello everyone and happy holidays, The changes below are ready for review! Benchmarks are also inside.
Expose all table metrics in virtual tables https://issues.apache.org/jira/browse/CASSANDRA-14572 https://github.com/apache/cassandra/pull/2958/files On Tue, 12 Dec 2023 at 22:05, Maxim Muzafarov <mmu...@apache.org> wrote: > > Hello everyone, > > > I still think Cassandra will benefit from having this idea implemented > and used through the source code, so I've done another round of > rethinking this concept and it seems I've found a solution. As a > result, we can significantly reduce the cost of implementing and > maintaining both new and existing virtual tables and make our users > happier by seeing everything they need through virtual tables. > > So, I think we should limit the scope of the original proposal to the > following: > ## A framework for exposing any internal data collection to virtual > tables ONLY. ## > > As a proof of concept, I took the CASSANDRA-14572 "Expose all table > metrics in virtual table" JIRA ticket, which provides a good > opportunity to demonstrate how to export all metrics to VTs at once > without having boilerplate implementations. Currently, we already have > CQLMetricsTable, BatchMetricsTable, etc. that expose metrics to VTs in > a pretty similar way, and most of the metrics groups are located under > the org.apache.cassandra.metrics package still lacks their > representation as VTs either. I've used the MetricRegistry collection > as a view of registered metrics to export them to VT using the > prototype accordingly. > > The prototype is complete. You can run a node locally and check the > available virtual tables with cqlsh, or you can check the changes > using the following link to the PR: > https://github.com/apache/cassandra/pull/2958/files > > > Below are some key points about the design itself: > > 1. All new virtual tables with metrics have "metric" as a prefix so > that they are fairly easy to find using TAB on the cqlsh command line. > Metrics are split into virtual tables as they are listed in the > org.apache.cassandra.metrics e.g. metrics_cql, metrics_tcm etc. In > addition, they are also grouped by metric type e.g. > metric_type_histogram, metric_type_counter etc. There is a table > called "metric_all_metric_groups" with all available metric groups. > > 2. To create a new virtual table representation of an internal > collection a developer needs to do two things: create a virtual table > row representation, and register it using > CollectionVirtualTableAdapter, which acts as an adapter between > internal data and a virtual table. Here's how I did it for the thread > pools VT, this is a fully backward compatible change: > https://github.com/apache/cassandra/pull/2958/files#diff-5fda13a633723cdf232bba465e6fb7ab74cdc02f7ba55dae4d1cf494ffb71abaR61 > > 3. The "metrics_keyspace" virtual table ended up being quite large > since it contains all the metrics for all available keyspaces on a > local node, so the default implementation provided by > AbstractVirtualTable is not suitable for the proposal. The > AbstractVirtualTable materializes a full data collection on the heap > using SimpleDataSet, regardless of the portion of data that is being > queried. In this case, we have to use an iterative approach, as the > CollectionVirtualTableAdapter does (the problem was discussed in > CASSANDRA-14629 and is now a part of the solution). This also helps to > keep the memory footprint low. > > 4. Another valuable change is the CassandraMetricsRegistry itself. The > problem here is that the metrics and their aliases are currently > exported to JMX, but the implemented virtual tables export the metrics > in their way and most of the cases don't respect the metric aliases > which are registered in the MetricsRegistry. This should be fixed as a > part of the CASSANDRA-14572 to avoid ambiguity for all known metrics > once and for all. > > Here are the links to the issue and the PR: > https://issues.apache.org/jira/browse/CASSANDRA-14572 > https://github.com/apache/cassandra/pull/2958/files > > > I'm excited about how these changes look right now, so please share > your feedback and thoughts. > The PR lacks good test coverage, I'll fix it as soon as we have a > clear vision of the design (or much sooner) :-) > > On Mon, 30 Jan 2023 at 17:43, David Capwell <dcapw...@apple.com> wrote: > > > > I *think* this is arguably true for a vtable / CQL-based solution as well > > from the "you don't know how people are using your API" perspective. > > > > > > Very fair point and think that justifies a different thread to talk about > > backwards compatibility for our tables (virtual and not); maybe we can lump > > together the JMX backwards compatibility conversation as well in that new > > thread. > > > > On Jan 28, 2023, at 4:09 PM, Josh McKenzie <jmcken...@apache.org> wrote: > > > > First off - thanks so much for putting in this effort Maxim! This is > > excellent work. > > > > Some thoughts on the CEP and responses in thread: > > > > Considering that JMX is usually not used and disabled in production > > environments for various performance and security reasons, the operator may > > not see the same picture from various of Dropwizard's metrics exporters and > > integrations as Cassandra's JMX metrics provide [1][2]. > > > > I don't think this assertion is true. Cassandra is running in a lot of > > places in the world, and JMX has been in this ecosystem for a long time; we > > need data that is basically impossible to get to claim "JMX is usually not > > used in C* environments in prod". > > > > I also wonder about if we should care about JMX? I know many wish to > > migrate (its going to be a very long time) away from JMX, so do we need a > > wrapper to make JMX and vtables consistent? > > > > If we can move away from a bespoke vtable or JMX based implementation and > > instead have a templatized solution each of these is generated from, that > > to me is the superior option. There's little harm in adding new JMX > > endpoints (or hell, other metrics framework integration?) as a byproduct of > > adding new vtable exposed metrics; we have the same maintenance obligation > > to them as we have to the vtables and if it generates from the same base > > data, we shouldn't have any further maintenance burden due to its presence > > right? > > > > we wish to move away from JMX > > > > I do, and you do, and many people do, but I don't believe all people on the > > project do. The last time this came up in slack the conclusion was "Josh > > should go draft a CEP to chart out a path to moving off JMX while > > maintaining backwards-compat w/existing JMX metrics for environments that > > are using them" (so I'm excited to see this CEP pop up before I got to it! > > ;)). Moving to a system that gives us a 0-cost way to keep JMX and vtable > > in sync over time on new metrics seems like a nice compromise for folks > > that have built out JMX-based maintenance infra on top of C*. Plus removing > > the boilerplate toil on vtables. win-win. > > > > If we add a column to the end of the JMX row did we just break users? > > > > I *think* this is arguably true for a vtable / CQL-based solution as well > > from the "you don't know how people are using your API" perspective. Unless > > we have clear guidelines about discretely selecting the columns you want > > from a vtable and trust users to follow them, if people have brittle greedy > > parsers pulling in all data from vtables we could very well break them as > > well by adding a new column right? Could be wrong here; I haven't written > > anything that consumes vtable metric data and maybe the obvious idiom in > > the face of that is robust in the presence of column addition. /shrug > > > > It's certainly more flexible and simpler to write to w/out detonating > > compared to JMX, but it's still an API we'd be revving. > > > > On Sat, Jan 28, 2023, at 4:24 PM, Ekaterina Dimitrova wrote: > > > > Overall I have similar thoughts and questions as David. > > > > I just wanted to add a reminder about this thread from last summer[1]. We > > already have issues with the alignment of JMX and Settings Virtual Table. I > > guess this is how Maxim got inspired to suggest this framework proposal > > which I want to thank him for! (I noticed he assigned CASSANDRA-15254) > > > > Not to open the Pandora box, but to me the most important thing here is to > > come into agreement about the future of JMX and what we will do or not as a > > community. Also, how much time people are able to invest. I guess this will > > influence any directions to be taken here. > > > > [1] > > https://lists.apache.org/thread/8mjcwdyqoobpvw2262bqmskkhs76pp69 > > > > > > On Thu, 26 Jan 2023 at 12:41, David Capwell <dcapw...@apple.com> wrote: > > > > I took a look and I see the result is an interface that looks like the > > vtable interface, that is then used by vtables and JMX? My first thought > > is why not just use the vtable logic? > > > > I also wonder about if we should care about JMX? I know many wish to > > migrate (its going to be a very long time) away from JMX, so do we need a > > wrapper to make JMX and vtables consistent? I am cool with something like > > the following > > > > registerWithJMX(jmxName, query(“SELECT * FROM system_views.streaming”)); > > > > > > So if we want to have a JMX view that matches the table then that’s cool by > > me, but one thing that has been brought up in reviews is backwards > > compatibility with regard to adding columns… If we add a column to the end > > of the JMX row did we just break users? > > > > Considering that JMX is usually not used and disabled in production > > environments for various performance and security reasons, the operator may > > not see the same picture from various of Dropwizard's metrics exporters > > > > If this is a real problem people are hitting, we can always add the ability > > to push metrics to common systems with a pluggable way to add non-standard > > solutions. Dropwizard already support this so would be low hanging fruit > > to address this. > > > > To make the proposed changes backwards compatible with the previous version > > of Cassandra, all MBeans and Virtual Tables we already have will remain > > unchanged > > > > > > If this is for new JMX endpoints moving forward, I am not sure of the > > benefit for the same reason listed above; we wish to move away from JMX > > > > On Jan 25, 2023, at 10:51 AM, Maxim Muzafarov <mmu...@apache.org> wrote: > > > > Hello Cassandra Community, > > > > > > I've been faced with a number of inconsistencies in the user APIs of > > the internal data collections representation exposed through the > > Cassandra monitoring interfaces that need to be fully aligned from an > > operator perspective. First of all, I'm highlighting JMX, Dropwizard > > Metrics, and Virtual Tables user interfaces. In order to address all > > these inconsistencies, I have created a draft enhancement proposal > > that describes everything I have found and how we can fix it once and > > for all. > > > > I'd like to hear your opinion and thoughts on it. Please take a look: > > https://docs.google.com/document/d/1j4J3bPWjQkAU9x4G-zxKObxPrKg36jLRT6xpUoNJa8Q > > > > > > -- > > Maxim Muzafarov > > > >