Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection

Jon Meredith Fri, 19 Nov 2021 09:19:10 -0800

Wouldn't modifying the CQL grammar would require updating the
application under test to perform experimentation?
The other thing I was wondering about is extensibility - for example
you would like to add a percentage chance for dropping messages for
more deterministic overload modeling.


I can see it being a property you set on keyspaces and/or tables as
that wouldn't be as intrusive or possible to execute with an ALTER
statement. Perhaps you could extend the ALTER TABLE/KEYSPACE grammar
to allow supplying a JSON blob that contains simulation / testing
parameters and have a setting in Config that enables the use of them
(default disabled) to prevent production issues.

yaml:
allow_simulation: true

cql:
ALTER TABLE keyspace.table WITH simulation = '{ "drop_percentage":
0.01, "additional_latency_millis": { "DC1": 1, "DC2": 80, "DC3": 200 }
}';

On Fri, Nov 19, 2021 at 9:00 AM Jeremiah D Jordan
<jeremiah.jor...@gmail.com> wrote:
>
> If it is per query, then I would think protocol level might be easier to 
> “test” a given application with.  Rather than having to append "WITH 
> ADDITIONAL LATENCY” to all your queries, you just set some option in your 
> query based object or such.  We already have support at the protocol level 
> for arbitrary query options being added, if you are worried about some driver 
> needing to add support it could be done through those.  Most of the drivers I 
> have looked at provide a method to put data into that metadata.
>
> I guess if you want to be the most flexible you could do both?  I could see 
> such a setting being done on multiple levels, implementing one or more of the 
> following:
>
> 1. As a new entry in the STARTUP message during connection handshake 
> introduced in a new native protocol version -> add latency to every response 
> over this connection
> 2. A new CQL command that sets the connection level latency -> all requests 
> after this command on this connection get additional latency XYZ (this 
> probably does still need some driver support, like USE as it would need the 
> driver to know to run the command on every open connection it had)
> 3. A new CQL command that sets the latency to a given ip/user -> all requests 
> after this command for a connection from the specified ip/user to the current 
> node get additional latency XYZ (this could help getting around multiple 
> connection issues, though unless it got propagated to all nodes in the 
> cluster you would still need some driver support to send the command to every 
> node the client was connected to)
> 4. As part of the request custom payload side channel -> just affects this 
> query
> 5. As part of a new flag introduced in a new native protocol version -> just 
> affects this query
> 6. As part of the CQL statements themselves -> just affects this query
>
> I can see good uses for most of those.  A CQL command to enable it globally 
> (2 or 3), and then additional CQL for per query (6) is probably supported by 
> the most existing clients without needing any changes.  I do think just 
> having a new per statement CQL option is not a great choice.  Though the 
> limitations of how 2/3 could be implemented make me think the “per request 
> custom payload” may actually be the option that is the most useful with the 
> least driver/user code change needed to work with it.
>
> -Jeremiah
>
> > On Nov 19, 2021, at 8:25 AM, bened...@apache.org wrote:
> >
> > To resurrect this discussion briefly, does anyone have a preference for 
> > either CQL Grammar or Protocol support?
> >
> > This originally felt to me like something we might want to support at the 
> > native protocol level, however that creates a dependency on specific 
> > clients and the feature might ultimately be less flexible. It’s not clear 
> > why we wouldn’t prefer some kind of CQL change like:
> >
> > SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY
> >
> > With queries being able to supply specific latencies if they so choose:
> >
> > SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY 4ms
> >
> > That might even support some DC->DC map for additional latencies:
> >
> > SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY ‘{dc1:{dc2: 4ms}}’
> >
> > This would leave applications a great deal of flexibility for experimenting 
> > with latency impacts, and greater ease for evolving this feature over time 
> > than specifying query eligibility at the protocol level.
> >
> > Does anyone have any thoughts about this?
> >
> > From: bened...@apache.org <mailto:bened...@apache.org> <bened...@apache.org 
> > <mailto:bened...@apache.org>>
> > Date: Wednesday, 6 October 2021 at 14:48
> > To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
> > <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>>
> > Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection
> > This is a very good point. I forget the reason we settled on consistency 
> > levels, I assume it was due to simplicity of the solution, as deploying 
> > support for a new protocol-level change is more involved.
> >
> > That’s probably not a good reason here, and I agree that overloading 
> > consistency level feels wrong. I hope we will retire user-provided 
> > consistency levels over the coming year or two, which is another good 
> > reason not to begin enhancing it with new meanings.
> >
> > I will rework the ticket and patches.
> >
> > From: Paulo Motta <pauloricard...@gmail.com>
> > Date: Wednesday, 6 October 2021 at 14:37
> > To: Cassandra DEV <dev@cassandra.apache.org>
> > Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection
> > This sounds like a great feature!
> >
> > I wonder if Consistencylevel is the best way to expose this to users
> > though, can't we implement this via another driver/protocol option ? Ie.
> > "delay_enabled" flag that would be a modifier to an existing CL.
> >
> > If we decide to go the CL route, I wonder if this isn't a good opportunity
> > to introduce pluggable consistency levels (CASSANDRA-8119 <
> > https://issues.apache.org/jira/browse/CASSANDRA-8119 
> > <https://issues.apache.org/jira/browse/CASSANDRA-8119>>)<https://issues.apache.org/jira/browse/CASSANDRA-8119%3e
> >  <https://issues.apache.org/jira/browse/CASSANDRA-8119%3e>)> so these would 
> > only
> > become available when the feature is enabled.
> >
> > My concern here is adding niche consistency levels to the default CL table
> > which may create confusion to non-power users.
> >
> > Em qua., 6 de out. de 2021 às 10:12, bened...@apache.org <
> > bened...@apache.org> escreveu:
> >
> >> Hi Everyone,
> >>
> >> This is a modest user-facing feature that I want to highlight in case
> >> anyone has any input. In order to validate if a real cluster may modify its
> >> topology or consistency level (e.g. from local to global), this ticket
> >> introduces a facility for injecting latency to internode messages. This is
> >> particularly helpful for high-availability topologies, and in particular
> >> for LWTs (where performance may be unpredictable due to contention), so
> >> that real traffic may be modified to experience gradually increasing
> >> latency in order to validate a topology (or the impact of a global
> >> consistency level) before any transition is undertaken.
> >>
> >> The user-visible changes include new config parameters, new JMX end points
> >> for modifying these parameters, and new consistency levels that may be
> >> supplied to mark queries as suitable for latency injection (so that
> >> applications may nominate queries for this mechanism)
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection

Reply via email to