> It's easy for an inverted index to find matches efficiently, but not so easy > for it to find non-matches.
Yes, I agree, it is not easy for an *index* to do that. But I think at least in SAI we could do that by using the index to find the matches, and, because they are always returned in the row-id order, just iterate all row identifiers skipping the ones found in the index (so computing the complement of the set of row ids). So we could do that at the iterator level, not at the index level, which is IMHO a good thing because that wouldn't need any storage format changes. Piotr Kołaczkowski e. pkola...@datastax.com w. www.datastax.com wt., 11 kwi 2023 o 21:55 Caleb Rackliffe <calebrackli...@gmail.com> napisał(a): > > +1 to the proposal from a CQL perspective > > However, whether we do this in the context of simple partition restriction, a > global index query, or a partition-restricted index query, the NOT operator > is most likely to be useful only in a post-filtering capacity. (ex. WHERE > indexed_set CONTAINS { 'foo'} AND indexed_set NOT CONTAINS { 'bar' }) > > Using Lucene as an example, you might remember that it doesn't (at least > IIRC) allow single predicate NOT queries. (See > https://stackoverflow.com/questions/3604771/not-query-in-lucene) It's easy > for an inverted index to find matches efficiently, but not so easy for it to > find non-matches. This is similar to, but even less-straightforward than, the > issue you have w/ boolean queries when you query the less selective of the > two possible values. You can create an accompanying "negated" index, but > that's not free, of course. > > Again, not necessarily a problem w/ the CEP, but want to call out the > potential complication... > > On Thu, Apr 6, 2023 at 4:01 PM Jeremy Hanna <jeremy.hanna1...@gmail.com> > wrote: >> >> Considering all of the examples require using ALLOW FILTERING with the >> partition key specified, I think it's appropriate to consider separating out >> use of ALLOW FILTERING within a partition versus ALLOW FILTERING across the >> whole table. A few years back we had a discussion about this in ASF slack >> in the context of capability restrictions and it seems relevant here. That >> is, we don't want people to get comfortable using ALLOW FILTERING across the >> whole table. However, there are times when ALLOW FILTERING within a >> partition is reasonable. >> >> Ticket to discuss separating them out: >> https://issues.apache.org/jira/browse/CASSANDRA-15803 >> Summary: Perhaps add an optional [WITHIN PARTITION] or something similar to >> make it backwards compatible and indicate that this is purely within the >> specified partition. >> >> This also gives us the ability to disallow table scan types of ALLOW >> FILTERING from a guard rail perspective, because the intent is explicit. >> That operators could disallow ALLOW FILTERING but allow ALLOW FILTERING >> WITHIN PARTITION, or whatever is decided. >> >> I do NOT want to hijack a good discussion but I thought this separation >> could be useful within this context. >> >> Jeremy >> >> On Apr 6, 2023, at 3:00 PM, Patrick McFadin <pmcfa...@gmail.com> wrote: >> >> I love that this is finally coming to Cassandra. Absolutely hate that, once >> again, we'll be endorsing the use of ALLOW FILTERING. This is an >> anti-pattern that keeps getting legitimized. >> >> Hot take: Should we just not do Milestones 1 and 2 and wait for an >> index-only Milestone 3? >> >> Patrick >> >> On Thu, Apr 6, 2023 at 10:04 AM David Capwell <dcapw...@apple.com> wrote: >>> >>> Overall I welcome this feature, was trying to use this around 1-2 months >>> back and found we didn’t support, so glad to see it coming! >>> >>> From a testing point of view, I think we would want to have good fuzz >>> testing covering complex types (frozen/non-frozen collections, tuples, udt, >>> etc.), and reverse ordering; both sections tend to cause the most problem >>> for new features (and existing ones) >>> >>> We also will want a way to disable this feature, and optionally disable at >>> different sections (such as m2’s NOT IN for partition keys). >>> >>> > On Apr 4, 2023, at 2:28 AM, Piotr Kołaczkowski <pkola...@datastax.com> >>> > wrote: >>> > >>> > Hi everyone! >>> > >>> > I created a new CEP for adding NOT support to the query language and >>> > want to start discussion around it: >>> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator >>> > >>> > Happy to get your feedback. >>> > -- >>> > Piotr >>> >>