Wouldn't modifying the CQL grammar would require updating the application under test to perform experimentation? The other thing I was wondering about is extensibility - for example you would like to add a percentage chance for dropping messages for more deterministic overload modeling.
I can see it being a property you set on keyspaces and/or tables as that wouldn't be as intrusive or possible to execute with an ALTER statement. Perhaps you could extend the ALTER TABLE/KEYSPACE grammar to allow supplying a JSON blob that contains simulation / testing parameters and have a setting in Config that enables the use of them (default disabled) to prevent production issues. yaml: allow_simulation: true cql: ALTER TABLE keyspace.table WITH simulation = '{ "drop_percentage": 0.01, "additional_latency_millis": { "DC1": 1, "DC2": 80, "DC3": 200 } }'; On Fri, Nov 19, 2021 at 9:00 AM Jeremiah D Jordan <jeremiah.jor...@gmail.com> wrote: > > If it is per query, then I would think protocol level might be easier to > “test” a given application with. Rather than having to append "WITH > ADDITIONAL LATENCY” to all your queries, you just set some option in your > query based object or such. We already have support at the protocol level > for arbitrary query options being added, if you are worried about some driver > needing to add support it could be done through those. Most of the drivers I > have looked at provide a method to put data into that metadata. > > I guess if you want to be the most flexible you could do both? I could see > such a setting being done on multiple levels, implementing one or more of the > following: > > 1. As a new entry in the STARTUP message during connection handshake > introduced in a new native protocol version -> add latency to every response > over this connection > 2. A new CQL command that sets the connection level latency -> all requests > after this command on this connection get additional latency XYZ (this > probably does still need some driver support, like USE as it would need the > driver to know to run the command on every open connection it had) > 3. A new CQL command that sets the latency to a given ip/user -> all requests > after this command for a connection from the specified ip/user to the current > node get additional latency XYZ (this could help getting around multiple > connection issues, though unless it got propagated to all nodes in the > cluster you would still need some driver support to send the command to every > node the client was connected to) > 4. As part of the request custom payload side channel -> just affects this > query > 5. As part of a new flag introduced in a new native protocol version -> just > affects this query > 6. As part of the CQL statements themselves -> just affects this query > > I can see good uses for most of those. A CQL command to enable it globally > (2 or 3), and then additional CQL for per query (6) is probably supported by > the most existing clients without needing any changes. I do think just > having a new per statement CQL option is not a great choice. Though the > limitations of how 2/3 could be implemented make me think the “per request > custom payload” may actually be the option that is the most useful with the > least driver/user code change needed to work with it. > > -Jeremiah > > > On Nov 19, 2021, at 8:25 AM, bened...@apache.org wrote: > > > > To resurrect this discussion briefly, does anyone have a preference for > > either CQL Grammar or Protocol support? > > > > This originally felt to me like something we might want to support at the > > native protocol level, however that creates a dependency on specific > > clients and the feature might ultimately be less flexible. It’s not clear > > why we wouldn’t prefer some kind of CQL change like: > > > > SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY > > > > With queries being able to supply specific latencies if they so choose: > > > > SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY 4ms > > > > That might even support some DC->DC map for additional latencies: > > > > SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY ‘{dc1:{dc2: 4ms}}’ > > > > This would leave applications a great deal of flexibility for experimenting > > with latency impacts, and greater ease for evolving this feature over time > > than specifying query eligibility at the protocol level. > > > > Does anyone have any thoughts about this? > > > > From: bened...@apache.org <mailto:bened...@apache.org> <bened...@apache.org > > <mailto:bened...@apache.org>> > > Date: Wednesday, 6 October 2021 at 14:48 > > To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> > > <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> > > Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection > > This is a very good point. I forget the reason we settled on consistency > > levels, I assume it was due to simplicity of the solution, as deploying > > support for a new protocol-level change is more involved. > > > > That’s probably not a good reason here, and I agree that overloading > > consistency level feels wrong. I hope we will retire user-provided > > consistency levels over the coming year or two, which is another good > > reason not to begin enhancing it with new meanings. > > > > I will rework the ticket and patches. > > > > From: Paulo Motta <pauloricard...@gmail.com> > > Date: Wednesday, 6 October 2021 at 14:37 > > To: Cassandra DEV <dev@cassandra.apache.org> > > Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection > > This sounds like a great feature! > > > > I wonder if Consistencylevel is the best way to expose this to users > > though, can't we implement this via another driver/protocol option ? Ie. > > "delay_enabled" flag that would be a modifier to an existing CL. > > > > If we decide to go the CL route, I wonder if this isn't a good opportunity > > to introduce pluggable consistency levels (CASSANDRA-8119 < > > https://issues.apache.org/jira/browse/CASSANDRA-8119 > > <https://issues.apache.org/jira/browse/CASSANDRA-8119>>)<https://issues.apache.org/jira/browse/CASSANDRA-8119%3e > > <https://issues.apache.org/jira/browse/CASSANDRA-8119%3e>)> so these would > > only > > become available when the feature is enabled. > > > > My concern here is adding niche consistency levels to the default CL table > > which may create confusion to non-power users. > > > > Em qua., 6 de out. de 2021 às 10:12, bened...@apache.org < > > bened...@apache.org> escreveu: > > > >> Hi Everyone, > >> > >> This is a modest user-facing feature that I want to highlight in case > >> anyone has any input. In order to validate if a real cluster may modify its > >> topology or consistency level (e.g. from local to global), this ticket > >> introduces a facility for injecting latency to internode messages. This is > >> particularly helpful for high-availability topologies, and in particular > >> for LWTs (where performance may be unpredictable due to contention), so > >> that real traffic may be modified to experience gradually increasing > >> latency in order to validate a topology (or the impact of a global > >> consistency level) before any transition is undertaken. > >> > >> The user-visible changes include new config parameters, new JMX end points > >> for modifying these parameters, and new consistency levels that may be > >> supplied to mark queries as suitable for latency injection (so that > >> applications may nominate queries for this mechanism) > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org