Ok, overall I think the discussion has settled and the feature is non-controversial, except the approach to ALLOW FILTERING. I added a note to non goals saying that we don't want to change the approach to ALLOW FILTERING here - and this proposal is to stay consistent with the current approach. We can always rethink the ALLOW FILTERING as a separate CEP as I can't see any reason why NOT operator should be special in that case - AF applies to all operators.
I'll start a VOTE. Thanks, Piotr Piotr Kołaczkowski e. pkola...@datastax.com w. www.datastax.com pt., 28 kwi 2023 o 11:22 Piotr Kołaczkowski <pkola...@datastax.com> napisał(a): > > > It's easy for an inverted index to find matches efficiently, but not so > > easy for it to find non-matches. > > Yes, I agree, it is not easy for an *index* to do that. > But I think at least in SAI we could do that by using the index to > find the matches, and, because they are always returned in the row-id > order, just iterate all row identifiers skipping the ones found in the > index (so computing the complement of the set of row ids). > So we could do that at the iterator level, not at the index level, > which is IMHO a good thing because that wouldn't need any storage > format changes. > > Piotr Kołaczkowski > e. pkola...@datastax.com > w. www.datastax.com > > wt., 11 kwi 2023 o 21:55 Caleb Rackliffe <calebrackli...@gmail.com> > napisał(a): > > > > +1 to the proposal from a CQL perspective > > > > However, whether we do this in the context of simple partition restriction, > > a global index query, or a partition-restricted index query, the NOT > > operator is most likely to be useful only in a post-filtering capacity. > > (ex. WHERE indexed_set CONTAINS { 'foo'} AND indexed_set NOT CONTAINS { > > 'bar' }) > > > > Using Lucene as an example, you might remember that it doesn't (at least > > IIRC) allow single predicate NOT queries. (See > > https://stackoverflow.com/questions/3604771/not-query-in-lucene) It's easy > > for an inverted index to find matches efficiently, but not so easy for it > > to find non-matches. This is similar to, but even less-straightforward > > than, the issue you have w/ boolean queries when you query the less > > selective of the two possible values. You can create an accompanying > > "negated" index, but that's not free, of course. > > > > Again, not necessarily a problem w/ the CEP, but want to call out the > > potential complication... > > > > On Thu, Apr 6, 2023 at 4:01 PM Jeremy Hanna <jeremy.hanna1...@gmail.com> > > wrote: > >> > >> Considering all of the examples require using ALLOW FILTERING with the > >> partition key specified, I think it's appropriate to consider separating > >> out use of ALLOW FILTERING within a partition versus ALLOW FILTERING > >> across the whole table. A few years back we had a discussion about this > >> in ASF slack in the context of capability restrictions and it seems > >> relevant here. That is, we don't want people to get comfortable using > >> ALLOW FILTERING across the whole table. However, there are times when > >> ALLOW FILTERING within a partition is reasonable. > >> > >> Ticket to discuss separating them out: > >> https://issues.apache.org/jira/browse/CASSANDRA-15803 > >> Summary: Perhaps add an optional [WITHIN PARTITION] or something similar > >> to make it backwards compatible and indicate that this is purely within > >> the specified partition. > >> > >> This also gives us the ability to disallow table scan types of ALLOW > >> FILTERING from a guard rail perspective, because the intent is explicit. > >> That operators could disallow ALLOW FILTERING but allow ALLOW FILTERING > >> WITHIN PARTITION, or whatever is decided. > >> > >> I do NOT want to hijack a good discussion but I thought this separation > >> could be useful within this context. > >> > >> Jeremy > >> > >> On Apr 6, 2023, at 3:00 PM, Patrick McFadin <pmcfa...@gmail.com> wrote: > >> > >> I love that this is finally coming to Cassandra. Absolutely hate that, > >> once again, we'll be endorsing the use of ALLOW FILTERING. This is an > >> anti-pattern that keeps getting legitimized. > >> > >> Hot take: Should we just not do Milestones 1 and 2 and wait for an > >> index-only Milestone 3? > >> > >> Patrick > >> > >> On Thu, Apr 6, 2023 at 10:04 AM David Capwell <dcapw...@apple.com> wrote: > >>> > >>> Overall I welcome this feature, was trying to use this around 1-2 months > >>> back and found we didn’t support, so glad to see it coming! > >>> > >>> From a testing point of view, I think we would want to have good fuzz > >>> testing covering complex types (frozen/non-frozen collections, tuples, > >>> udt, etc.), and reverse ordering; both sections tend to cause the most > >>> problem for new features (and existing ones) > >>> > >>> We also will want a way to disable this feature, and optionally disable > >>> at different sections (such as m2’s NOT IN for partition keys). > >>> > >>> > On Apr 4, 2023, at 2:28 AM, Piotr Kołaczkowski <pkola...@datastax.com> > >>> > wrote: > >>> > > >>> > Hi everyone! > >>> > > >>> > I created a new CEP for adding NOT support to the query language and > >>> > want to start discussion around it: > >>> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator > >>> > > >>> > Happy to get your feedback. > >>> > -- > >>> > Piotr > >>> > >>