Aesthetically and without taking standards into consideration, my personal preference is the CQL-centric, then SQL, then postgres. There's a tension here where standards have value, but if the standards are ugly or counterintuitive they also have costs. :)
I think we need to have a clear focused conversation about whether we're going to uniformly prioritize consistency with postgres' syntax, SQL, or do our own thing and then codify that consensus so people have guidance during API design going forward. On Tue, Oct 7, 2025, at 3:22 PM, Maxim Muzafarov wrote: > Hello Folks, > > > First of all, thank you for your comments. Your feedback motivates me > to implement these changes and refine the final result to the highest > standard. To keep the vote thread clean, I'm addressing your questions > in the discussion thread. > > The vote is here: > https://lists.apache.org/thread/zmgvo2ty5nqvlz1xccsls2kcrgnbjh5v > > > = The idea: = > > First, let me focus on the general idea, and then I will answer your > questions in more detail. > > The main focus is on introducing a new API (CQL) to invoke the same > node management commands. While this has an indirect effect on tooling > (cqlsh, nodetool), the tooling itself is not the main focus. The scope > (or Phase 1) of the initial changes is narrowed down only to the API > only, to ensure the PR remains reviewable. > > This implies the following: > - the nodetool commands and the way they are implemented won't change > - the nodetool commands will be accessible via CQL, their > implementation will not change (and the execution locality) > - this change introduces ONLY a new way of how management commands > will be invoked > - this change is not about the tooling (cqlsh, nodetool), it will help > them evolve, however > - these changes are being introduced as an experimental API with a > feature flag, disabled by default > > > = The answers: = > > > how will the new CQL API behave if the user does not specify a hostname? > > The changes only affect the API part; improvements to the tooling will > follow later. The command is executed on the node that the client is > connected to. > Note also that the port differs from 9042 (default) as a new > management port will be introduced. See examples here [1]. > > cqlsh 10.20.88.164 11211 -u myusername -p mypassword > nodetool -h 10.20.88.164 -p 8081 -u myusername -pw mypassword > > If a host is not specified, the cli tool will attempt to connect to > localhost. I suppose. > > > > My understanding is that commands like nodetool bootstrap typically run on > > a single node. > > This is correct; however, as I don't control the implementation of the > command, it may actually involve communication with other nodes. This > is actually not part of this CEP. I'm only reusing the commands we > already have. > > > > Will we continue requiring users to specify a hostname/port explicitly, or > > will the CQL API be responsible for orchestrating the command safely across > > the entire cluster or datacenter? > > It seems that you are confusing the API with the tooling. The tooling > (cqlsh, nodetool) will continue to work as it does now. I am only > adding a new way in which commands can be invoked - CQL, > orchestration, however, is the subject of other projects. Cassandra > Sidecar? > > > > It might, however, be worth verifying that the proposed CQL syntax aligns > > with PostgreSQL conventions, and adjusting it if needed for > > cross-compatibility. > > It's a bit new info to me that we're targeting PostgreSQL as the main > reference and drifting towards the invoking management operations the > same way. I'm inclined to agree that the syntax should probably be > similar, more or less, however. > > We are introducing a new CQL syntax in a minimal and isolated manner. > The CEP-38 defines a small set of management-oriented CQL statements > (EXECUTE COMMAND / DESCRIBE COMMAND) that can be used to match all > existing nodetool commands at once, introducing further aliases as an > option. This eliminates the need to introduce a new antlr grammar for > each management operation. > > The command execution syntax is the main thing that users interact > with in this CEP, but I'm taking a more relaxed approach to it for the > following reasons: > - the tip of the iceberg, the unification of the JMX, CQL and possible > REST API for Cassandra is priority; > - the feature will be in experimental state in the major release, we > need collect the real feedback from users and their deployments; > - the aliasing will be used for some important commands like > compaction, bootstrap; > > Taking all of the above into account, I still think it's important to > reach an agreement, or at least to avoid objections. > So, I've checked the PostgreSQL and SQL standards to identify areas of > alignment. The latter I think is relatively easy to support as > aliases. > > > The syntax proposed in the CEP: > > EXECUTE COMMAND forcecompact WITH keyspace=distributed_test_keyspace > AND table=tbl AND keys=["k4", "k2", "k7"]; > > Other Cassandra-style options that I had previously considered: > > 1. EXECUTE COMMAND forcecompact (keyspace=distributed_test_keyspace, > table=tbl, keys=["k4", "k2", "k7"]); > 2. EXECUTE COMMAND forcecompact WITH ARGS {"keyspace": > "distributed_test_keyspace", "table": "tbl", "keys":["k4", "k2", > "k7"]}; > > With the postgresql context [2] it could look like: > > COMPACT (keys=["k4", "k2", "k7"]) distributed_test_keyspace.tbl; > > The SQL-standard [3][4] procedural approach: > > CALL system_mgmt.forcecompact( > keyspace => 'distributed_test_keyspace', > table => 'tbl', > keys => ['k4','k2','k7'], > options => { "parallel": 2, "verbose": true } > ); > > > Please let me know if you have any questions, or if you would like us > to arrange a call to discuss all the details. > > > [1] > https://www.instaclustr.com/support/documentation/cassandra/using-cassandra/connect-to-cassandra-with-cqlsh/ > [2] https://www.postgresql.org/docs/current/sql-vacuum.html > [3] > https://en.wikipedia.org/wiki/Stored_procedure?utm_source=chatgpt.com#Implementation > [4] https://www.postgresql.org/docs/9.3/functions-admin.html > > On Fri, 5 Sept 2025 at 14:12, Maxim Muzafarov <[email protected]> wrote: > > > > Hi Bernardo, > > > > Thanks for bumping up the discussion. > > I plan to schedule the vote for next week. > > > > If anyone has any comments or concerns, please let me know so that I > > can incorporate them into the CEP. The general design remains the > > same, and with picolci taking his place we can reuse the same commands > > for the CQL. > > > > On Wed, 3 Sept 2025 at 17:58, Bernardo Botella > > <[email protected]> wrote: > > > > > > Hi Maxim! > > > > > > I just wanted to resurface this thread as it looks like it felt down the > > > cracks (unless I missed something?). I am excited about this feature as > > > well (it should also help with the configuration via CQL we also > > > discussed on CEP-44). > > > > > > I guess that the CEP has been up for discussion for a while, and if there > > > is no further feedback or concerns, we could call a vote on it? > > > > > > Regards, > > > Bernardo > > > > > > > On Jul 29, 2025, at 8:16 AM, Maxim Muzafarov <[email protected]> wrote: > > > > > > > > Hello everyone, > > > > > > > > > > > > Now that the dust has settled on the Picocli transition, I would like > > > > to update my prototype and prepare it for review. It will take some > > > > time, but I hope to have everything ready within the next couple of > > > > months. Although we haven't voted on this CEP yet, as far as I can > > > > see, there is more or less consensus on the path forward. > > > > > > > > So, my question is: > > > > > > > > Should we wait until the prototype is ready for review, or should we > > > > initiate a vote? I saw some concerns about this CEP online since it > > > > hasn't been voted on, but I'm still eager to implement it. Anyway, a > > > > new feature flag will be added within implementation and the feature > > > > will be disabled by default in the next release. > > > > > > > > > > > >> no. Maxim and I have had some offline discussions. We need to make > > > >> some changes before we can be ready to vote on it. > > > > > > > > I believe this has already been addressed. I've added new sections, > > > > "Command Authorization" [1] and "AdminPort"[2] to the CEP. > > > > Let me know if this is okay with you, Dinesh. > > > > > > > > [1] > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-38%3A+CQL+Management+API#CEP38:CQLManagementAPI-CommandAuthorization > > > > [2] > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465810#CEP38:CQLManagementAPI-AdminPort > > > > > > > > On Thu, 19 Sept 2024 at 20:11, Dinesh Joshi <[email protected]> wrote: > > > >> > > > >> no. Maxim and I have had some offline discussions. We need to make > > > >> some changes before we can be ready to vote on it. > > > >> > > > >> On Thu, Sep 19, 2024 at 11:09 AM Patrick McFadin <[email protected]> > > > >> wrote: > > > >>> > > > >>> There is no VOTE thread for this CEP. Is this ready for one? > > > >>> > > > >>> On Tue, Jan 9, 2024 at 3:28 AM Maxim Muzafarov <[email protected]> > > > >>> wrote: > > > >>>> > > > >>>> Jon, > > > >>>> > > > >>>> That sounds good. Let's make these commands rely on the settings > > > >>>> virtual table and keep the initial changes as minimal as possible. > > > >>>> > > > >>>> We've also scheduled a Cassandra Contributor Meeting on January 30th > > > >>>> 2024, so I'll prepare some slides with everything we've got so far > > > >>>> and > > > >>>> try to prepare some drafts to demonstrate the design. > > > >>>> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Contributor+Meeting > > > >>>> > > > >>>> On Tue, 9 Jan 2024 at 00:55, Jon Haddad <[email protected]> wrote: > > > >>>>> > > > >>>>> It's great to see where this is going and thanks for the discussion > > > >>>>> on the ML. > > > >>>>> > > > >>>>> Personally, I think adding two new ways of accomplishing the same > > > >>>>> thing is a net negative. It means we need more documentation and > > > >>>>> creates inconsistencies across tools and users. The tradeoffs > > > >>>>> you've listed are worth considering, but in my opinion adding 2 new > > > >>>>> ways to accomplish the same thing hurts the project more than it > > > >>>>> helps. > > > >>>>> > > > >>>>>> - I'd like to see a symmetry between the JMX and CQL APIs, so that > > > >>>>>> users will have a sense of the commands they are using and are less > > > >>>>> likely to check the documentation; > > > >>>>> > > > >>>>> I've worked with a couple hundred teams and I can only think of a > > > >>>>> few who use JMX directly. It's done very rarely. After 10 years, > > > >>>>> I still have to look up the JMX syntax to do anything useful, > > > >>>>> especially if there's any quoting involved. Power users might know > > > >>>>> a handful of JMX commands by heart, but I suspect most have a > > > >>>>> handful of bash scripts they use instead, or have a sidecar. I > > > >>>>> also think very few users will migrate their management code from > > > >>>>> JMX to CQL, nor do I imagine we'll move our own tools until the > > > >>>>> `disablebinary` problem is solved. > > > >>>>> > > > >>>>>> - It will be easier for us to move the nodetool from the jmx > > > >>>>>> client that is used under the hood to an implementation based on a > > > >>>>>> java-driver and use the CQL for the same; > > > >>>>> > > > >>>>> I can't imagine this would make a material difference. If > > > >>>>> someone's rewriting a nodetool command, how much time will be spent > > > >>>>> replacing the JMX call with a CQL one? Looking up a virtual table > > > >>>>> isn't going to be what consumes someone's time in this process. > > > >>>>> Again, this won't be done without solving `nodetool disablebinary`. > > > >>>>> > > > >>>>>> if we have cassandra-15254 merged, it will cost almost nothing to > > > >>>>>> support the exec syntax for setting properties; > > > >>>>> > > > >>>>> My concern is more about the weird user experience of having two > > > >>>>> ways of doing the same thing, less about the technical overhead of > > > >>>>> adding a second implementation. I propose we start simple, see if > > > >>>>> any of the reasons you've listed are actually a real problem, then > > > >>>>> if they are, address the issue in a follow up. > > > >>>>> > > > >>>>> If I'm wrong, it sounds like it's fairly easy to add `exec` for > > > >>>>> changing configs. If I'm right, we'll have two confusing syntaxes > > > >>>>> forever. It's a lot easier to add something later than take it > > > >>>>> away. > > > >>>>> > > > >>>>> How does that sound? > > > >>>>> > > > >>>>> Jon > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> On Mon, Jan 8, 2024 at 7:55 PM Maxim Muzafarov <[email protected]> > > > >>>>> wrote: > > > >>>>>> > > > >>>>>>> Some operations will no doubt require a stored procedure syntax, > > > >>>>>>> but perhaps it would be a good idea to split the work into two: > > > >>>>>> > > > >>>>>> These are exactly the first steps I have in mind: > > > >>>>>> > > > >>>>>> [Ready for review] > > > >>>>>> Allow UPDATE on settings virtual table to change running > > > >>>>>> configurations > > > >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-15254 > > > >>>>>> > > > >>>>>> This issue is specifically aimed at changing the configuration > > > >>>>>> properties we are talking about (value is in yaml format): > > > >>>>>> e.g. UPDATE system_views.settings SET compaction_throughput = > > > >>>>>> 128Mb/s; > > > >>>>>> > > > >>>>>> [Ready for review] > > > >>>>>> Expose all table metrics in virtual table > > > >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-14572 > > > >>>>>> > > > >>>>>> This is to observe the running configuration and all available > > > >>>>>> metrics: > > > >>>>>> e.g. select * from system_views.thread_pools; > > > >>>>>> > > > >>>>>> > > > >>>>>> I hope both of the issues above will become part of the trunk > > > >>>>>> branch > > > >>>>>> before we move on to the CQL management commands. In this topic, > > > >>>>>> I'd > > > >>>>>> like to discuss the design of the CQL API, and gather feedback, so > > > >>>>>> that I can prepare a draft of changes to look at without any > > > >>>>>> surprises, and that's exactly what this discussion is about. > > > >>>>>> > > > >>>>>> > > > >>>>>> cqlsh> UPDATE system.settings SET compaction_throughput = 128; > > > >>>>>> cqlsh> exec setcompactionthroughput 128 > > > >>>>>> > > > >>>>>> I don't mind removing the exec command from the CQL command API > > > >>>>>> which > > > >>>>>> is intended to change settings. Personally, I see the second > > > >>>>>> option as > > > >>>>>> just an alias for the first command, and in fact, they will have > > > >>>>>> the > > > >>>>>> same implementation under the hood, so please consider the > > > >>>>>> rationale > > > >>>>>> below: > > > >>>>>> > > > >>>>>> - I'd like to see a symmetry between the JMX and CQL APIs, so that > > > >>>>>> users will have a sense of the commands they are using and are less > > > >>>>>> likely to check the documentation; > > > >>>>>> - It will be easier for us to move the nodetool from the jmx client > > > >>>>>> that is used under the hood to an implementation based on a > > > >>>>>> java-driver and use the CQL for the same; > > > >>>>>> - if we have cassandra-15254 merged, it will cost almost nothing to > > > >>>>>> support the exec syntax for setting properties; > > > >>>>>> > > > >>>>>> On Mon, 8 Jan 2024 at 20:13, Jon Haddad <[email protected]> wrote: > > > >>>>>>> > > > >>>>>>> Ugh, I moved some stuff around and 2 paragraphs got merged that > > > >>>>>>> shouldn't have been. > > > >>>>>>> > > > >>>>>>> I think there's no way we could rip out JMX, there's just too > > > >>>>>>> many benefits to having it and effectively zero benefits to > > > >>>>>>> removing. > > > >>>>>>> > > > >>>>>>> Regarding disablebinary, part of me wonders if this is a bit of a > > > >>>>>>> hammer, and what we really want is "disable binary for > > > >>>>>>> non-admins". I'm not sure what the best path is to get there. > > > >>>>>>> The local unix socket might be the easiest path as it allows us > > > >>>>>>> to disable network binary easily and still allow local admins, > > > >>>>>>> and allows the OS to reject the incoming connections vs passing > > > >>>>>>> that work onto a connection handler which would have to evaluate > > > >>>>>>> whether or not the user can connect. If a node is already in a > > > >>>>>>> bad spot requring disable binary, it's probably not a good idea > > > >>>>>>> to have it get DDOS'ed as part of the remediation. > > > >>>>>>> > > > >>>>>>> Sorry for multiple emails. > > > >>>>>>> > > > >>>>>>> Jon > > > >>>>>>> > > > >>>>>>> On Mon, Jan 8, 2024 at 4:11 PM Jon Haddad <[email protected]> > > > >>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>> Syntactically, if we’re updating settings like compaction > > > >>>>>>>>> throughput, I would prefer to simply update a virtual settings > > > >>>>>>>>> table > > > >>>>>>>>> e.g. UPDATE system.settings SET compaction_throughput = 128 > > > >>>>>>>> > > > >>>>>>>> I agree with this, sorry if that wasn't clear in my previous > > > >>>>>>>> email. > > > >>>>>>>> > > > >>>>>>>>> Some operations will no doubt require a stored procedure syntax, > > > >>>>>>>> > > > >>>>>>>> The alternative to the stored procedure syntax is to have first > > > >>>>>>>> class support for operations like REPAIR or COMPACT, which could > > > >>>>>>>> be interesting. It might be a little nicer if the commands are > > > >>>>>>>> first class citizens. I'm not sure what the downside would be > > > >>>>>>>> besides adding complexity to the parser. I think I like the > > > >>>>>>>> idea as it would allow for intuitive tab completion (REPAIR > > > >>>>>>>> <tab>) and mentally fit in with the rest of the permission > > > >>>>>>>> system, and be fairly obvious what permission relates to what > > > >>>>>>>> action. > > > >>>>>>>> > > > >>>>>>>> cqlsh > GRANT INCREMENTAL REPAIR ON mykeyspace.mytable TO jon; > > > >>>>>>>> > > > >>>>>>>> I realize the ability to grant permissions could be done for the > > > >>>>>>>> stored procedure syntax as well, but I think it's a bit more > > > >>>>>>>> consistent to represent it the same way as DDL and probably > > > >>>>>>>> better for the end user. > > > >>>>>>>> > > > >>>>>>>> Postgres seems to generally do admin stuff with SELECT > > > >>>>>>>> function(): > > > >>>>>>>> https://www.postgresql.org/docs/9.3/functions-admin.html. It > > > >>>>>>>> feels a bit weird to me to use SELECT to do things like kill DB > > > >>>>>>>> connections, but that might just be b/c it's not how I typically > > > >>>>>>>> work with a database. VACUUM is a standalone command though. > > > >>>>>>>> > > > >>>>>>>> Curious to hear what people's thoughts are on this. > > > >>>>>>>> > > > >>>>>>>>> I would like to see us move to decentralised structured > > > >>>>>>>>> settings management at the same time, so that we can set > > > >>>>>>>>> properties for the whole cluster, or data centres, or > > > >>>>>>>>> individual nodes via the same mechanism - all from any node in > > > >>>>>>>>> the cluster. I would be happy to help out with this work, if > > > >>>>>>>>> time permits. > > > >>>>>>>> > > > >>>>>>>> This would be nice. Spinnaker has this feature and I found it > > > >>>>>>>> to be very valuable at Netflix when making large changes. > > > >>>>>>>> > > > >>>>>>>> Regarding JMX - I think since it's about as close as we can get > > > >>>>>>>> to "free" I don't really consider it to be additional overhead, > > > >>>>>>>> a decent escape hatch, and I can't see us removing any > > > >>>>>>>> functionality that most teams would consider critical. > > > >>>>>>>> > > > >>>>>>>>> We need something that's available for use before the node > > > >>>>>>>>> comes fully online > > > >>>>>>>>> Supporting backwards compat, especially for automated ops (i.e. > > > >>>>>>>>> nodetool, JMX, etc), is crucial. Painful, but crucial. > > > >>>>>>>> > > > >>>>>>>> I think there's no way we could rip out JMX, there's just too > > > >>>>>>>> many benefits to having it and effectively zero benefits to > > > >>>>>>>> removing. Part of me wonders if this is a bit of a hammer, and > > > >>>>>>>> what we really want is "disable binary for non-admins". I'm not > > > >>>>>>>> sure what the best path is to get there. The local unix socket > > > >>>>>>>> might be the easiest path as it allows us to disable network > > > >>>>>>>> binary easily and still allow local admins, and allows the OS to > > > >>>>>>>> reject the incoming connections vs passing that work onto a > > > >>>>>>>> connection handler which would have to evaluate whether or not > > > >>>>>>>> the user can connect. If a node is already in a bad spot > > > >>>>>>>> requring disable binary, it's probably not a good idea to have > > > >>>>>>>> it get DDOS'ed as part of the remediation. > > > >>>>>>>> > > > >>>>>>>> I think it's safe to say there's no appetite to remove JMX, at > > > >>>>>>>> least not for anyone that would have to rework their entire > > > >>>>>>>> admin control plane, plus whatever is out there in OSS > > > >>>>>>>> provisioning tools like puppet / chef / etc that rely on JMX. I > > > >>>>>>>> see no value whatsoever in removing it. > > > >>>>>>>> > > > >>>>>>>> I should probably have phrased my earlier email a bit > > > >>>>>>>> differently. Maybe this is better: > > > >>>>>>>> > > > >>>>>>>> Fundamentally, I think it's better for the project if > > > >>>>>>>> administration is fully supported over CQL in addition to JMX, > > > >>>>>>>> without introducing a redundant third option, with the project's > > > >>>>>>>> preference being CQL. > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> On Mon, Jan 8, 2024 at 9:10 AM Benedict Elliott Smith > > > >>>>>>>> <[email protected]> wrote: > > > >>>>>>>>> > > > >>>>>>>>> Syntactically, if we’re updating settings like compaction > > > >>>>>>>>> throughput, I would prefer to simply update a virtual settings > > > >>>>>>>>> table > > > >>>>>>>>> > > > >>>>>>>>> e.g. UPDATE system.settings SET compaction_throughput = 128 > > > >>>>>>>>> > > > >>>>>>>>> Some operations will no doubt require a stored procedure > > > >>>>>>>>> syntax, but perhaps it would be a good idea to split the work > > > >>>>>>>>> into two: one part to address settings like those above, and > > > >>>>>>>>> another for maintenance operations such as triggering major > > > >>>>>>>>> compactions, repair and the like? > > > >>>>>>>>> > > > >>>>>>>>> I would like to see us move to decentralised structured > > > >>>>>>>>> settings management at the same time, so that we can set > > > >>>>>>>>> properties for the whole cluster, or data centres, or > > > >>>>>>>>> individual nodes via the same mechanism - all from any node in > > > >>>>>>>>> the cluster. I would be happy to help out with this work, if > > > >>>>>>>>> time permits. > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> On 8 Jan 2024, at 11:42, Josh McKenzie <[email protected]> > > > >>>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>> Fundamentally, I think it's better for the project if > > > >>>>>>>>> administration is fully done over CQL and we have a consistent, > > > >>>>>>>>> single way of doing things. > > > >>>>>>>>> > > > >>>>>>>>> Strongly agree here. With 2 caveats: > > > >>>>>>>>> > > > >>>>>>>>> Supporting backwards compat, especially for automated ops (i.e. > > > >>>>>>>>> nodetool, JMX, etc), is crucial. Painful, but crucial. > > > >>>>>>>>> We need something that's available for use before the node > > > >>>>>>>>> comes fully online; the point Jeff always brings up when we > > > >>>>>>>>> discuss moving away from JMX. So long as we have some kind of > > > >>>>>>>>> "out-of-band" access to nodes or accommodation for that, we > > > >>>>>>>>> should be good. > > > >>>>>>>>> > > > >>>>>>>>> For context on point 2, see slack: > > > >>>>>>>>> https://the-asf.slack.com/archives/CK23JSY2K/p1688745128122749?thread_ts=1688662169.018449&cid=CK23JSY2K > > > >>>>>>>>> > > > >>>>>>>>> I point out that JMX works before and after the native protocol > > > >>>>>>>>> is running (startup, shutdown, joining, leaving), and also it's > > > >>>>>>>>> semi-common for us to disable the native protocol in certain > > > >>>>>>>>> circumstances, so at the very least, we'd then need to > > > >>>>>>>>> implement a totally different cql protocol interface just for > > > >>>>>>>>> administration, which nobody has committed to building yet. > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> I think this is a solvable problem, and I think the benefits of > > > >>>>>>>>> having a single, elegant way of interacting with a cluster and > > > >>>>>>>>> configuring it justifies the investment for us as a project. > > > >>>>>>>>> Assuming someone has the cycles to, you know, actually do the > > > >>>>>>>>> work. :D > > > >>>>>>>>> > > > >>>>>>>>> On Sun, Jan 7, 2024, at 10:41 PM, Jon Haddad wrote: > > > >>>>>>>>> > > > >>>>>>>>> I like the idea of the ability to execute certain commands via > > > >>>>>>>>> CQL, but I think it only makes sense for the nodetool commands > > > >>>>>>>>> that cause an action to take place, such as compact or repair. > > > >>>>>>>>> We already have virtual tables, I don't think we need another > > > >>>>>>>>> layer to run informational queries. I see little value in > > > >>>>>>>>> having the following (I'm using exec here for simplicity): > > > >>>>>>>>> > > > >>>>>>>>> cqlsh> exec tpstats > > > >>>>>>>>> > > > >>>>>>>>> which returns a string in addition to: > > > >>>>>>>>> > > > >>>>>>>>> cqlsh> select * from system_views.thread_pools > > > >>>>>>>>> > > > >>>>>>>>> which returns structured data. > > > >>>>>>>>> > > > >>>>>>>>> I'd also rather see updatable configuration virtual tables > > > >>>>>>>>> instead of > > > >>>>>>>>> > > > >>>>>>>>> cqlsh> exec setcompactionthroughput 128 > > > >>>>>>>>> > > > >>>>>>>>> Fundamentally, I think it's better for the project if > > > >>>>>>>>> administration is fully done over CQL and we have a consistent, > > > >>>>>>>>> single way of doing things. I'm not dead set on it, I just > > > >>>>>>>>> think less is more in a lot of situations, this being one of > > > >>>>>>>>> them. > > > >>>>>>>>> > > > >>>>>>>>> Jon > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> On Wed, Jan 3, 2024 at 2:56 PM Maxim Muzafarov > > > >>>>>>>>> <[email protected]> wrote: > > > >>>>>>>>> > > > >>>>>>>>> Happy New Year to everyone! I'd like to thank everyone for their > > > >>>>>>>>> questions, because answering them forces us to move towards the > > > >>>>>>>>> right > > > >>>>>>>>> solution, and I also like the ML discussions for the time they > > > >>>>>>>>> give to > > > >>>>>>>>> investigate the code :-) > > > >>>>>>>>> > > > >>>>>>>>> I'm deliberately trying to limit the scope of the initial > > > >>>>>>>>> solution > > > >>>>>>>>> (e.g. exclude the agent part) to keep the discussion short and > > > >>>>>>>>> clear, > > > >>>>>>>>> but it's also important to have a glimpse of what we can do > > > >>>>>>>>> next once > > > >>>>>>>>> we've finished with the topic. > > > >>>>>>>>> > > > >>>>>>>>> My view of the Command<> is that it is an abstraction in the > > > >>>>>>>>> broader > > > >>>>>>>>> sense of an operation that can be performed on the local node, > > > >>>>>>>>> involving one of a few internal components. This means that > > > >>>>>>>>> updating a > > > >>>>>>>>> property in the settings virtual table via an update statement, > > > >>>>>>>>> or > > > >>>>>>>>> executing e.g. the setconcurrentcompactors command are just > > > >>>>>>>>> aliases of > > > >>>>>>>>> the same internal command via different APIs. Another example > > > >>>>>>>>> is the > > > >>>>>>>>> netstats command, which simply aggregates the MessageService > > > >>>>>>>>> metrics > > > >>>>>>>>> and returns them in a human-readable format (just another way of > > > >>>>>>>>> looking at key-value metric pairs). More broadly, the command > > > >>>>>>>>> input is > > > >>>>>>>>> Map<String, String> and String as the result (or List<String>). > > > >>>>>>>>> > > > >>>>>>>>> As Abe mentioned, Command and CommandRegistry should be largely > > > >>>>>>>>> based > > > >>>>>>>>> on the nodetool command set at the beginning. We have a few > > > >>>>>>>>> options > > > >>>>>>>>> for how we can initially construct command metadata during the > > > >>>>>>>>> registry implementation (when moving command metadata from the > > > >>>>>>>>> nodetool to the core part), so I'm planning to consult with the > > > >>>>>>>>> command representations of the k8cassandra project in the way > > > >>>>>>>>> of any > > > >>>>>>>>> further registry adoptions have zero problems (by writing a test > > > >>>>>>>>> openapi registry exporter and comparing the representation > > > >>>>>>>>> results). > > > >>>>>>>>> > > > >>>>>>>>> So, the MVP is the following: > > > >>>>>>>>> - Command > > > >>>>>>>>> - CommandRegistry > > > >>>>>>>>> - CQLCommandExporter > > > >>>>>>>>> - JMXCommandExporter > > > >>>>>>>>> - the nodetool uses the JMXCommandExporter > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> = Answers = > > > >>>>>>>>> > > > >>>>>>>>>> What do you have in mind specifically there? Do you plan on > > > >>>>>>>>>> rewriting a brand new implementation which would be partially > > > >>>>>>>>>> inspired by our agent? Or would the project integrate our > > > >>>>>>>>>> agent code in-tree or as a dependency? > > > >>>>>>>>> > > > >>>>>>>>> Personally, I like the state of the k8ssandra project as it is > > > >>>>>>>>> now. My > > > >>>>>>>>> understanding is that the server part of a database always lags > > > >>>>>>>>> behind > > > >>>>>>>>> the client and sidecar parts in terms of the jdk version and the > > > >>>>>>>>> features it provides. In contrast, sidecars should always be on > > > >>>>>>>>> top of > > > >>>>>>>>> the market, so if we want to make an agent part in-tree, this > > > >>>>>>>>> should > > > >>>>>>>>> be carefully considered for the flexibility which we may lose, > > > >>>>>>>>> as we > > > >>>>>>>>> will not be able to change the agent part within the sidecar. > > > >>>>>>>>> The only > > > >>>>>>>>> closest change I can see is that we can remove the interceptor > > > >>>>>>>>> part > > > >>>>>>>>> once the CQL command interface is available. I suggest we move > > > >>>>>>>>> the > > > >>>>>>>>> agent part to phase 2 and research it. wdyt? > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>> How are the results of the commands expressed to the CQL > > > >>>>>>>>>> client? Since the command is being treated as CQL, I guess it > > > >>>>>>>>>> will be rows, right? If yes, some of the nodetool commands > > > >>>>>>>>>> output are a bit hierarchical in nature (e.g. cfstats, > > > >>>>>>>>>> netstats etc...). How are these cases handled? > > > >>>>>>>>> > > > >>>>>>>>> I think the result of the execution should be a simple string > > > >>>>>>>>> (or set > > > >>>>>>>>> of strings), which by its nature matches the nodetool output. I > > > >>>>>>>>> would > > > >>>>>>>>> avoid building complex output or output schemas for now to > > > >>>>>>>>> simplify > > > >>>>>>>>> the initial changes. > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>> Any changes expected at client/driver side? > > > >>>>>>>>> > > > >>>>>>>>> I'd like to keep the initial changes to a server part only, to > > > >>>>>>>>> avoid > > > >>>>>>>>> scope inflation. For the driver part, I have checked the > > > >>>>>>>>> ExecutionInfo > > > >>>>>>>>> interface provided by the java-driver, which should probably be > > > >>>>>>>>> used > > > >>>>>>>>> as a command execution status holder. We'd like to have a unique > > > >>>>>>>>> command execution id for each command that is executed on the > > > >>>>>>>>> node, so > > > >>>>>>>>> the ExecutionInfo should probably hold such an id. Currently it > > > >>>>>>>>> has > > > >>>>>>>>> the UUID getTracingId(), which is not well suited for our case > > > >>>>>>>>> and I > > > >>>>>>>>> think further changes and follow-ups will be required here > > > >>>>>>>>> (including > > > >>>>>>>>> the binary protocol, I think). > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>> The term COMMAND is a bit abstract I feel (subjective)... And > > > >>>>>>>>>> I also feel the settings part is overlapping with virtual > > > >>>>>>>>>> tables. > > > >>>>>>>>> > > > >>>>>>>>> I think we should keep the term Command as broad as it > > > >>>>>>>>> possible. As > > > >>>>>>>>> long as we have a single implementation of a command, and the > > > >>>>>>>>> cost of > > > >>>>>>>>> maintaining that piece of the source code is low, it's even > > > >>>>>>>>> better if > > > >>>>>>>>> we have a few ways to achieve the same result using different > > > >>>>>>>>> APIs. > > > >>>>>>>>> Personally, the only thing I would vote for is the separation of > > > >>>>>>>>> command and metric terms (they shouldn't be mixed up). > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>> How are the responses of different operations expressed > > > >>>>>>>>>> through the Command API? If the Command Registry Adapters > > > >>>>>>>>>> depend upon the command metadata for invoking/validating the > > > >>>>>>>>>> command, then I think there has to be a way for them to > > > >>>>>>>>>> interpret the response format also, right? > > > >>>>>>>>> > > > >>>>>>>>> I'm not sure, that I've got the question correctly. Are you > > > >>>>>>>>> talking > > > >>>>>>>>> about the command execution result schema and the validation of > > > >>>>>>>>> that > > > >>>>>>>>> schema? > > > >>>>>>>>> > > > >>>>>>>>> For now, I see the interface as follows, the result of the > > > >>>>>>>>> execution > > > >>>>>>>>> is a type that can be converted to the same string as the > > > >>>>>>>>> nodetool has > > > >>>>>>>>> for the corresponding command (so that the outputs match): > > > >>>>>>>>> > > > >>>>>>>>> Command<A, R> > > > >>>>>>>>> { > > > >>>>>>>>> printResult(A argument, R result, Consumer<String> printer); > > > >>>>>>>>> } > > > >>>>>>>>> > > > >>>>>>>>> On Tue, 5 Dec 2023 at 16:51, Abe Ratnofsky <[email protected]> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>> Adding to Hari's comments: > > > >>>>>>>>>> > > > >>>>>>>>>>> Any changes expected at client/driver side? While using > > > >>>>>>>>>>> JMX/nodetool, it is clear that the command/operations are > > > >>>>>>>>>>> getting executed against which Cassandra node. But a client > > > >>>>>>>>>>> can connect to multiple hosts and trigger queries, then how > > > >>>>>>>>>>> can it ensure that commands are executed against the desired > > > >>>>>>>>>>> Cassandra instance? > > > >>>>>>>>>> > > > >>>>>>>>>> Clients are expected to set the node for the given CQL > > > >>>>>>>>>> statement in cases like this; see docstring for example: > > > >>>>>>>>>> https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/api/core/cql/Statement.java#L124-L147 > > > >>>>>>>>>> > > > >>>>>>>>>>> The term COMMAND is a bit abstract I feel (subjective). Some > > > >>>>>>>>>>> of the examples quoted are referring to updating settings > > > >>>>>>>>>>> (for example: EXECUTE COMMAND setconcurrentcompactors WITH > > > >>>>>>>>>>> concurrent_compactors=5;) and some are referring to > > > >>>>>>>>>>> operations. Updating settings and running operations are > > > >>>>>>>>>>> considerably different things. They may have to be handled in > > > >>>>>>>>>>> their own way. And I also feel the settings part is > > > >>>>>>>>>>> overlapping with virtual tables. If virtual tables support > > > >>>>>>>>>>> writes (at least the settings virtual table), then settings > > > >>>>>>>>>>> can be updated using the virtual table itself. > > > >>>>>>>>>> > > > >>>>>>>>>> I agree with this - I actually think it would be clearer if > > > >>>>>>>>>> this was referred to as nodetool, if the set of commands is > > > >>>>>>>>>> going to be largely based on nodetool at the beginning. There > > > >>>>>>>>>> is a lot of documentation online that references nodetool by > > > >>>>>>>>>> name, and changing the nomenclature would make that existing > > > >>>>>>>>>> documentation harder to understand. If a user can understand > > > >>>>>>>>>> this as "nodetool, but better and over CQL not JMX" I think > > > >>>>>>>>>> that's a clearer transition than a new concept of "commands". > > > >>>>>>>>>> > > > >>>>>>>>>> I understand that this proposal includes more than just > > > >>>>>>>>>> nodetool, but there's a benefit to having a tool with a name, > > > >>>>>>>>>> and a web search for "cassandra commands" is going to have > > > >>>>>>>>>> more competition and ambiguity. > > > >>>>>>>>> > > > >>>>>>>>> > > > >
