Wouldn't modifying the CQL grammar would require updating the
application under test to perform experimentation?
The other thing I was wondering about is extensibility - for example
you would like to add a percentage chance for dropping messages for
more deterministic overload modeling.
I can see it being a property you set on keyspaces and/or tables as
that wouldn't be as intrusive or possible to execute with an ALTER
statement. Perhaps you could extend the ALTER TABLE/KEYSPACE grammar
to allow supplying a JSON blob that contains simulation / testing
parameters and have a setting in Config that enables the use of them
(default disabled) to prevent production issues.
yaml:
allow_simulation: true
cql:
ALTER TABLE keyspace.table WITH simulation = '{ "drop_percentage":
0.01, "additional_latency_millis": { "DC1": 1, "DC2": 80, "DC3": 200 }
}';
On Fri, Nov 19, 2021 at 9:00 AM Jeremiah D Jordan
<[email protected]> wrote:
>
> If it is per query, then I would think protocol level might be easier to
> “test” a given application with. Rather than having to append "WITH
> ADDITIONAL LATENCY” to all your queries, you just set some option in your
> query based object or such. We already have support at the protocol level
> for arbitrary query options being added, if you are worried about some driver
> needing to add support it could be done through those. Most of the drivers I
> have looked at provide a method to put data into that metadata.
>
> I guess if you want to be the most flexible you could do both? I could see
> such a setting being done on multiple levels, implementing one or more of the
> following:
>
> 1. As a new entry in the STARTUP message during connection handshake
> introduced in a new native protocol version -> add latency to every response
> over this connection
> 2. A new CQL command that sets the connection level latency -> all requests
> after this command on this connection get additional latency XYZ (this
> probably does still need some driver support, like USE as it would need the
> driver to know to run the command on every open connection it had)
> 3. A new CQL command that sets the latency to a given ip/user -> all requests
> after this command for a connection from the specified ip/user to the current
> node get additional latency XYZ (this could help getting around multiple
> connection issues, though unless it got propagated to all nodes in the
> cluster you would still need some driver support to send the command to every
> node the client was connected to)
> 4. As part of the request custom payload side channel -> just affects this
> query
> 5. As part of a new flag introduced in a new native protocol version -> just
> affects this query
> 6. As part of the CQL statements themselves -> just affects this query
>
> I can see good uses for most of those. A CQL command to enable it globally
> (2 or 3), and then additional CQL for per query (6) is probably supported by
> the most existing clients without needing any changes. I do think just
> having a new per statement CQL option is not a great choice. Though the
> limitations of how 2/3 could be implemented make me think the “per request
> custom payload” may actually be the option that is the most useful with the
> least driver/user code change needed to work with it.
>
> -Jeremiah
>
> > On Nov 19, 2021, at 8:25 AM, [email protected] wrote:
> >
> > To resurrect this discussion briefly, does anyone have a preference for
> > either CQL Grammar or Protocol support?
> >
> > This originally felt to me like something we might want to support at the
> > native protocol level, however that creates a dependency on specific
> > clients and the feature might ultimately be less flexible. It’s not clear
> > why we wouldn’t prefer some kind of CQL change like:
> >
> > SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY
> >
> > With queries being able to supply specific latencies if they so choose:
> >
> > SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY 4ms
> >
> > That might even support some DC->DC map for additional latencies:
> >
> > SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY ‘{dc1:{dc2: 4ms}}’
> >
> > This would leave applications a great deal of flexibility for experimenting
> > with latency impacts, and greater ease for evolving this feature over time
> > than specifying query eligibility at the protocol level.
> >
> > Does anyone have any thoughts about this?
> >
> > From: [email protected] <mailto:[email protected]> <[email protected]
> > <mailto:[email protected]>>
> > Date: Wednesday, 6 October 2021 at 14:48
> > To: [email protected] <mailto:[email protected]>
> > <[email protected] <mailto:[email protected]>>
> > Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection
> > This is a very good point. I forget the reason we settled on consistency
> > levels, I assume it was due to simplicity of the solution, as deploying
> > support for a new protocol-level change is more involved.
> >
> > That’s probably not a good reason here, and I agree that overloading
> > consistency level feels wrong. I hope we will retire user-provided
> > consistency levels over the coming year or two, which is another good
> > reason not to begin enhancing it with new meanings.
> >
> > I will rework the ticket and patches.
> >
> > From: Paulo Motta <[email protected]>
> > Date: Wednesday, 6 October 2021 at 14:37
> > To: Cassandra DEV <[email protected]>
> > Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection
> > This sounds like a great feature!
> >
> > I wonder if Consistencylevel is the best way to expose this to users
> > though, can't we implement this via another driver/protocol option ? Ie.
> > "delay_enabled" flag that would be a modifier to an existing CL.
> >
> > If we decide to go the CL route, I wonder if this isn't a good opportunity
> > to introduce pluggable consistency levels (CASSANDRA-8119 <
> > https://issues.apache.org/jira/browse/CASSANDRA-8119
> > <https://issues.apache.org/jira/browse/CASSANDRA-8119>>)<https://issues.apache.org/jira/browse/CASSANDRA-8119%3e
> > <https://issues.apache.org/jira/browse/CASSANDRA-8119%3e>)> so these would
> > only
> > become available when the feature is enabled.
> >
> > My concern here is adding niche consistency levels to the default CL table
> > which may create confusion to non-power users.
> >
> > Em qua., 6 de out. de 2021 às 10:12, [email protected] <
> > [email protected]> escreveu:
> >
> >> Hi Everyone,
> >>
> >> This is a modest user-facing feature that I want to highlight in case
> >> anyone has any input. In order to validate if a real cluster may modify its
> >> topology or consistency level (e.g. from local to global), this ticket
> >> introduces a facility for injecting latency to internode messages. This is
> >> particularly helpful for high-availability topologies, and in particular
> >> for LWTs (where performance may be unpredictable due to contention), so
> >> that real traffic may be modified to experience gradually increasing
> >> latency in order to validate a topology (or the impact of a global
> >> consistency level) before any transition is undertaken.
> >>
> >> The user-visible changes include new config parameters, new JMX end points
> >> for modifying these parameters, and new consistency levels that may be
> >> supplied to mark queries as suitable for latency injection (so that
> >> applications may nominate queries for this mechanism)
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]