Re: Changing the output of tooling between majors

Miklosovic, Stefan Sat, 08 Jul 2023 09:58:00 -0700

If somebody understood my message as I am promoting the removal of all these 
commands for which we have other means of getting the output of, that is not 
the case at all. I do not want to remove any of them.. I am just elaborating on 
"parsing the output of nodetool and problems related to that if it is changed" 
in this particular case.

________________________________________
From: Miklosovic, Stefan <[email protected]>
Sent: Saturday, July 8, 2023 17:43
To: dev
Subject: Re: Changing the output of tooling between majors

Thank you, Josh, for your insight.

I think they should not parse that output in the first place. Gradually 
introducing JSON / YAML output formats for nodetool is cool but I think it 
started to happen too late and people were already parsing the raw nodetool 
output and here we are.

I played with nodetool a little bit to see where we are with this, there is 135 
commands in total. We can leave out all "set*" commands, we can not ignore 
"get*" because that is potential output to parse. People just don't parse the 
output of "set*" commands. That is 116 commands. We can also ignore all 
"disable*" and "enable" commands and we are on 98. Then there is the group of 
"invalidate*" commands, we can skip them too, we are on 90, minus help command, 
89.

Now the commands which left can be categorized into two main groups: the 
commands which execute some action and commands which display some statistics 
or state about internals of a Cassandra node.

The first group, "action commands", are again not going to be parsed on the 
output. These are here (1) (I could make some mistakes here and there).

So, the commands we can potentially parse the output of are here (2), there is 
roughly 51 of them.

Some of these commands have their equivalent in system_views vtables, these 
are, if I havent forgotten something

clientstats (system_views.clients)
compactionhistory (system.compaction_history)
compactionstats (system_views.sstable_tasks)
gossipinfo (system_views.gossip_info)
listsnapshots (system_view.snapshots)
tpstats (system_view.thread_pools)

Some of them have already different format of the output supported (JSON or 
YAML), they are:

datapaths
tablestats
tpstats (has also cql table)
compactionhistory (has also cql table)

I would argue that some commands with prefix "status" and "get" can go away too 
because their value is visible in system_views.settings. Some of these settings 
will be even updateable after Maxim's work.

statusbackup incremental_backups
statushandoff hinted_handoff_enabled
getmaxhintwindow max_hint_window
getconcurrentcompactors concurrent_compactors
getconcurrentviewbuilders concurrent_materialized_view_builders
getdefaultrf default_keyspace_rf
gettimeout (this just reflects cassandra.yaml more or less)

Then there is the family of all "get throttle / threshold " etc like this, I am 
lazy to go through them but they are somehow retrievable from CQL 
system_views.settings too.

getbatchlogreplaythrottle
getcolumnindexsize
getcompactionthreshold
getcompactionthroughput
getinterdcstreamthroughput
getsnapshotthrottle
getstreamthroughput

There are commands which just return an integer or there is nothing to change 
about their output / it is just not necessary like:

gettraceprobability
getsstables

So commands which do not have their output equivalent in some cql table or for 
which there is not JSON / YAML format available are

describecluster
describering
failuredetector
gcstats
getauditlog
getauthcacheconfig
getconcurrency
getendpoints
getfullquerylog
getlogginglevels
getseeds
info
listpendinghints
netstats
profileload (replacement of toppartition (which should be removed in 5.0, 
actually))
proxyhistograms
rangekeysample
repair
repair_admin
ring
status
statusautocompaction
statusbinary
statusgossip
tablehistograms
toppartitions
viewbuildstatus

From these, if one asks which ones actually make sense to try to tweak the 
output of, they might be

describecluster
describering
info
listpendinghints
netstats
proxyhistograms
repair_admin (if somebody wants to list stuff in json)
ring
status
tablehistograms
viewbuildstatus

The point I want to make is that I do not think the problem of changing the 
output is too hot. There is basically like 15 at most commands for which the 
output matter because there is not their CQL equivalent or JSON / YAML output.

If we are providing CQL / JSON / YAML for couple years, I do not believe that 
the argument "lets not break it for folks in nodetool" is still relevant. CQL 
output is there from times of 4.0 at least (at least!) and YAML / JSON is also 
not something completely new. It is not like we are suddenly forcing people to 
change their habits, there was enough time to update the stuff to CQL / json / 
yaml etc ...

But really, the question I still don't have an answer for is who is actually 
parsing the output, I think I ping user ML list to probe the situation a little 
bit.

(1) https://gist.github.com/smiklosovic/3f4ea8ccae53ad503af13c53789815be
(2) https://gist.github.com/smiklosovic/f9a681016c22e2dfe88c883b6881cb7c

________________________________________
From: Josh McKenzie <[email protected]>
Sent: Saturday, July 8, 2023 14:47
To: dev
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Once there is, we are free to change the default output however we want.
One thing I always try to keep in mind on discussions like this. A thought 
experiment (with very hand-wavy numbers; try not to get hung up on them):

* Let's say there are 5,000 discrete "users" of C* out there (different groups 
of people using the DB)
* And assume 5% have written some kind of scripting / automation to parse our 
tooling output (250)
* And let's assume it'd take 18 developer hours (a few days at 6 hours/day) to 
retool to the new output, validate and test correctness, and then roll it out 
to qa, test, validate, and then to prod, test, validate

You're looking at 250 * 18 hours, 4,500 hours, 112.5 40 hour work weeks (2+ 
years for some poor sod without vacations) worth of work from what seems to be 
a simple change.

Now, that estimate could be off by an order of magnitude either way, but the 
motion of the exercise is valuable, I think. There's a real magnified 
downstream cost to our community when we make changes to APIs and we need to 
weigh that against the cost to the project in terms of maintaining those 
interfaces.

The above mental exercise really strongly applies to the periodic discussions 
where we talk about deprecating JMX support.

Not saying we should or shouldn't change things here for the record, just want 
to call this out for anyone that might not have been thinking about things this 
way.

On Fri, Jul 7, 2023, at 3:23 PM, Brandon Williams wrote:
On Fri, Jul 7, 2023 at 2:20 PM Miklosovic, Stefan
<[email protected]<mailto:[email protected]>> wrote:
>
> Great thanks. That might work.
>
> So we do not change the default output unless there is json / yaml equivalent.
>
> Once there is, we are free to change the default output however we want.

Yes, exactly.  Then we have the best of both worlds: programmatic
access that isn't flimsy, and a pretty display however we want it.

Re: Changing the output of tooling between majors

Reply via email to