[jira] [Comment Edited] (CASSANDRA-15803) Separate out allow filtering scanning through a partition versus scanning over the table

Stefan Miklosovic (Jira) Thu, 27 Apr 2023 05:16:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717153#comment-17717153
 ]


Stefan Miklosovic edited comment on CASSANDRA-15803 at 4/27/23 12:15 PM:
-------------------------------------------------------------------------

I see that this example is a corner case, rather interesting one. I do not mind 
to bend the semantic here if it makes UX more intuitive. If we specify all keys 
plus some regular column upon selection, it may return at most one row anyway 
in every case.

For indexes ... I don't know. Maybe checking if a query is done against an 
index and keep the same logic as it is currently done might be possible?

Completely getting rid of ALLOW FILTERING is a shocker for me. That is 
something to bring to a ML for sure. I would keep it. I would also fix this 
corner case I provided the patch for. I am not sure about the rest. But in 
general we either get rid of that completely or we will go even further adding 
"within partition" or a guardrail ... 

One advantage of ALLOW FILTERING in contrast of a guardrail is that the former 
is directly in CQL. If an operator needs to think about how users are going to 
fetch data instead of users themselves, are not we making this more complicated 
than necessary? We would be delegating some responsibilities from CQL to a 
guardrail which is configurable by an operator. Also, we would need to make 
sure that this is configurable in runtime as well as we would need to be sure 
that all nodes are configured the very same way, no? If I hit another node 
which is not configured like the current one, I would basically circumvent it.


was (Author: smiklosovic):
I see that this example is a corner case, rather interesting one. I do not mind 
to bend the semantic here if it makes UX more intuitive. If we specify all keys 
plus some regular column upon selection, it may return at most one row anyway 
in every case.

Completely getting rid of ALLOW FILTERING is a shocker for me. That is 
something to bring to a ML for sure. I would keep it. I would also fix this 
corner case I provided the patch for. I am not sure about the rest. But in 
general we either get rid of that completely or we will go even further adding 
"within partition" or a guardrail ... 

One advantage of ALLOW FILTERING in contrast of a guardrail is that the former 
is directly in CQL. If an operator needs to think about how users are going to 
fetch data instead of users themselves, are not we making this more complicated 
than necessary? We would be delegating some responsibilities from CQL to a 
guardrail which is configurable by an operator. Also, we would need to make 
sure that this is configurable in runtime as well as we would need to be sure 
that all nodes are configured the very same way, no? If I hit another node 
which is not configured like the current one, I would basically circumvent it.

> Separate out allow filtering scanning through a partition versus scanning 
> over the table
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15803
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15803
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL/Syntax
>            Reporter: Jeremy Hanna
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>
> Currently allow filtering can mean two things in the spirit of "avoid 
> operations that don't seek to a specific row or sequential rows of data."  
> First, it can mean scanning across the entire table to meet the criteria of 
> the query.  That's almost always a bad thing and should be discouraged or 
> disabled (see CASSANDRA-8303).  Second, it can mean filtering within a 
> specific partition.  For example, in a query you could specify the full 
> partition key and if you specify a criterion on a non-key field, it requires 
> allow filtering.
> The second reason to require allow filtering is significantly less work to 
> scan through a partition.  It is still extra work over seeking to a specific 
> row and getting N sequential rows though.  So while an application developer 
> and/or operator needs to be cautious about this second type, it's not 
> necessarily a bad thing, depending on the table and the use case.
> I propose that we separate the way to specify allow filtering across an 
> entire table from specifying allow filtering across a partition in a 
> backwards compatible way.  One idea that was brought up in Slack in the 
> cassandra-dev room was to have allow filtering mean the superset - scanning 
> across the table.  Then if you want to specify that you *only* want to scan 
> within a partition you would use something like
> {{ALLOW FILTERING [WITHIN PARTITION]}}
> So it will succeed if you specify non-key criteria within a single partition, 
> but fail with a message to say it requires the full allow filtering.  This 
> would allow for a backwards compatible full allow filtering while allowing a 
> user to specify that they want to just scan within a partition, but error out 
> if trying to scan a full table.
> This is potentially also related to the capability limitation framework by 
> which operators could more granularly specify what features are allowed or 
> disallowed per user, discussed in CASSANDRA-8303.  This way an operator could 
> disallow the more general allow filtering while allowing the partition scan 
> (or disallow them both at their discretion).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-15803) Separate out allow filtering scanning through a partition versus scanning over the table

Reply via email to