[jira] [Comment Edited] (CASSANDRA-18796) Optionally fail when a non-partition-restricted query is issued against a storage-attached index with a backing table using LCS

Stefan Miklosovic (Jira) Thu, 07 Dec 2023 02:35:04 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794117#comment-17794117
 ]


Stefan Miklosovic edited comment on CASSANDRA-18796 at 12/7/23 10:34 AM:
-------------------------------------------------------------------------

Thank you all for the guidance, I reworked the patch to reflect this new 
approach. I opened the patch for the review (1)

Couple points though:

1) It is done only for SAI, I think that it would be cool to do this for all 
indexes, "old" secondary indexes too. Tell me if this is a must in this patch 
or not.
2) I am not completely sure how to know if underlying query is without 
partition keys in QueryController.maybeTriggerGuardrails method. 
QueryController just knows what is the first and the last PrimaryKey of such 
query. The trick I did is that it is a range query (hence without partition 
keys) if the firstPrimaryKey is equal to lastPrimaryKey. In that case it is 
basically whole ring, that's how I understand it. Maybe there is better way but 
I am not sure about that. Keep in mind that this is also happening on replicas 
only. This is not an issue in SelectStatement where we have more rich API to 
work with. We may completely get rid of this check on replicas if we think that 
check in SelectStatement is enough.
3) There is a test case checking warning path for 3 nodes. Failure path has to 
be still finished.
4) it would be cool to know exactly what the violations were, per violated 
replica. The way Warnings and its Counter is done for now is that as it merges 
responses from replicas, it will always merge it in such a way that we only 
know maximum. So if there are three nodes, one read from 5 referenced indexes, 
the second one from 10 and the third one from 15, we will see all replicas here 
but the maximum showed will be 15 - and we do not know what node it is. This 
would require further refactoring, I think counters would need to be more like 
a meter, per node.
5) Lastly, we debugged this quite in depth with [~maedhroz] yesterday but I 
noticed upon testing that warnings and failures do not produce same message to 
client.

Warning looks like this:

{code}
LogResult{mark=346824, result=[WARN  [node1_isolatedExecutor:1] node1 
2023-12-07 11:18:46,539 All nodes [/127.0.0.1:7012, /127.0.0.2:7012, 
/127.0.0.3:7012] referenced more than allowed number of indexes without 
restrictions on partition key, maximum on a node being 3 indexes, and issued 
warnings for query SELECT * FROM distributed_test_keyspace.t_0 WHERE v1 = 
1701944326319 ALLOW FILTERING (see 
maximum_sstables_with_non_paritition_key_restricted_query_warn_threshold)]}
{code}

However, failure looks like this:

{code}
stefan@cqlsh> select * from cycling.cyclist_semi_pro2 where country = 'ITA';
ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] 
message="Operation failed - received 0 responses and 1 failures: 
READ_TOO_MANY_INDEXES from localhost/127.0.0.1:7000" info={'consistency': 
'ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 1, 
'error_code_map': {'127.0.0.1': '0x0007'}}
{code}

The reason why this is happening is not unknown yet, I think it is a bug but we 
do not know where yet. For starters, when this is called upon failure (2), it 
will not trigger this (3) which will not update ClientWarnings and it will go 
with that raw message only which is quite a bummer. I would love if [~dcapwell] 
took a look to have a fresh set of eyes on this as we are not sure what is 
happening there.

(1) https://github.com/apache/cassandra/pull/2961
(2) 
https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/service/reads/ReadCallback.java#L181
(3) 
https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/service/reads/ReadCallback.java#L130-L141


was (Author: smiklosovic):
Thank you all for the guidance, I reworked the patch to reflect this new 
approach. I opened the patch for the review (1)

Couple points though:

1) It is done only for SAI, I think that it would be cool to do this for all 
indexes, "old" secondary indexes too. Tell me if this is a must in this patch 
or not.
2) I am not completely sure how to know if underlying query is without 
partition keys in QueryController.maybeTriggerGuardrails method. 
QueryController just knows what is the first and the last PrimaryKey of such 
query. The trick I did is that it is a range query (hence without partition 
keys) if the firstPrimaryKey is equal to lastPrimaryKey. In that case it is 
basically whole ring, that's how I understand it. Maybe there is better way but 
I am not sure about that. Keep in mind that this is also happening on replicas 
only. This is not an issue in SelectStatement where we have more rich API to 
work with. We may completely get rid of this check on replicas if we think that 
check in SelectStatement is enough.
3) There is a test case checking warning path for 3 nodes. Failure path has to 
be still finished.
4) it would be cool to know exactly what the violations were, per violated 
replica. The way Warnings and its Counter is done for now is that as it merges 
responses from replicas, it will always merge it in such a way that we only 
know maximum. So if there are three nodes, one read from 5 referenced indexes, 
the second one from 10 and the third one from 15, we will see all replicas here 
but the maximum showed will be 15 - and we do not know what node it is. This 
would require further refactoring, I think counters would need to be more like 
a meter, per node.
5) Lastly, we debugged this quite in depth with [~maedhroz] yesterday but I 
noticed upon testing that warnings and failures do not produce same message to 
client.

Warning looks like this:

{code}
LogResult{mark=346824, result=[WARN  [node1_isolatedExecutor:1] node1 
2023-12-07 11:18:46,539 All nodes [/127.0.0.1:7012, /127.0.0.2:7012, 
/127.0.0.3:7012] referenced more than allowed number of indexes without 
restrictions on partition key, maximum on a node being 3 indexes, and issued 
warnings for query SELECT * FROM distributed_test_keyspace.t_0 WHERE v1 = 
1701944326319 ALLOW FILTERING (see 
maximum_sstables_with_non_paritition_key_restricted_query_warn_threshold)]}
{code}

However, failure looks like this:

{code}
stefan@cqlsh> select * from cycling.cyclist_semi_pro2 where country = 'ITA';
ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] 
message="Operation failed - received 0 responses and 1 failures: 
READ_TOO_MANY_INDEXES from localhost/127.0.0.1:7000" info={'consistency': 
'ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 1, 
'error_code_map': {'127.0.0.1': '0x0007'}}
{code}

The reason why this is happening is not unknown yet, I think it is a bug but we 
do not know where yet. For starters, when this is called upon failure (1), it 
will not trigger this (2) which will not update ClientWarnings and it will go 
with that raw message only which is quite a bummer. I would love if [~dcapwell] 
took a look to have a fresh set of eyes on this as we are not sure what is 
happening there.

https://github.com/apache/cassandra/pull/2961
(1) 
https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/service/reads/ReadCallback.java#L181
(2) 
https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/service/reads/ReadCallback.java#L130-L141

> Optionally fail when a non-partition-restricted query is issued against a 
> storage-attached index with a backing table using LCS
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18796
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18796
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/2i Index, Feature/SAI, Local/Compaction/LCS
>            Reporter: Caleb Rackliffe
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.0-rc, 5.x
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> With LCS, we will have potentially thousands of SSTables for a given user 
> table. Storage-attached also means SSTable-attached, and searching thousands 
> of attached indexes is not going to scale well at all locally, due to the 
> sheer number of searches and amount of postings list merging involved. We 
> should have a guardrail to prohibit this by default.
> Partition-restricted queries, the use-case SAI is broadly designed for, 
> should be very efficient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-18796) Optionally fail when a non-partition-restricted query is issued against a storage-attached index with a backing table using LCS

Reply via email to