[ https://issues.apache.org/jira/browse/CASSANDRA-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Caleb Rackliffe reassigned CASSANDRA-17424: ------------------------------------------- Assignee: Caleb Rackliffe > Performance and Semantic Concerns w/ Metrics for Local vs. Remote Requests in > StorageProxy > ------------------------------------------------------------------------------------------ > > Key: CASSANDRA-17424 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17424 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics > Reporter: Caleb Rackliffe > Assignee: Caleb Rackliffe > Priority: Normal > > In CASSANDRA-10023, we added two new metrics to both {{ClientRequestMetrics}} > and {{ClientWriteRequestMetrics}} to represent requests where the driver > either does or does not make a correct token-aware choice of coordinator. > (Auditing driver behavior is listed as the primary goal of that Jira.) > There are, however, a few concerns we should address before this releases in > 4.1: > 1.) With paging enabled and a LIMIT < fetch size, {{IN}} queries can hit > {{fetchRows()}} multiple times, so the number of local + remote requests > isn’t the same as the number of queries marked in {{ClientRequestMetrics}} in > {{readRegular()}}. > 2.) {{IN}} queries will potentially mark a bunch of “remote” requests even if > one key in the {{IN}} set is “local”. > 3.) Something similar happens with mutations. If {{StorageProxy#mutate()}} > receives multiple mutations, we’ll mark against one of these new metrics in > {{ClientWriteRequestMetrics}} for each mutation, while > {{ClientWriteRequestMetrics}} will only register the actual client request > once. > For cases 2 and 3, we may mark both local and remote requests for the same > overall client request, which introduces ambiguity if these are intended to > help audit driver coordinator selection behavior. There are a few options: > a.) We can accept the ambiguity, but then we haven’t really accomplished the > goal of CASSANDRA-10023 for some request types. > b.) We can simply not record any of these metrics for requests where multiple > partitions/tokens are involved. > c.) We can be lenient, marking requests as “local” if any of the > partitions/tokens involved in the client request are, in fact, local. > “c” feels like the option that preserves as much functionality as possible > without being ambiguous, but problem #2 above is still tricky, given the way > IN and GROUP BY queries behave w/ paging. (Perhaps ambiguity in that case is > acceptable?) > In addition to the general ambiguity around the above… > 4.) There is excessive object creation involved (on a hot path) in our > determination of whether a request is local or remote. We should be able to > mitigate this by getting rid of > {{AbstractReadExecutor#getContactedReplicas()}} and relying on > {{ReplicaPlan#lookup()}} rather than creating strings. (Even for writes, we > should be able to push down marking into performWrite(), where the write > ReplicaPlan is already available.) -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org