Ok, overall I think the discussion has settled and the feature is
non-controversial, except the approach to ALLOW FILTERING.
I added a note to non goals saying that we don't want to change the
approach to ALLOW FILTERING here - and this proposal is to stay
consistent with the current approach.
We can always rethink the ALLOW FILTERING as a separate CEP as I can't
see any reason why NOT operator should be special in that case - AF
applies to all operators.

I'll start a VOTE.

Thanks,
Piotr

Piotr Kołaczkowski
e. pkola...@datastax.com
w. www.datastax.com


pt., 28 kwi 2023 o 11:22 Piotr Kołaczkowski <pkola...@datastax.com> napisał(a):
>
> > It's easy for an inverted index to find matches efficiently, but not so 
> > easy for it to find non-matches.
>
> Yes, I agree, it is not easy for an *index* to do that.
> But I think at least in SAI we could do that by using the index to
> find the matches, and, because they are always returned in the row-id
> order, just iterate all row identifiers skipping the ones found in the
> index (so computing the complement of the set of row ids).
> So we could do that at the iterator level, not at the index level,
> which is IMHO a good thing because that wouldn't need any storage
> format changes.
>
> Piotr Kołaczkowski
> e. pkola...@datastax.com
> w. www.datastax.com
>
> wt., 11 kwi 2023 o 21:55 Caleb Rackliffe <calebrackli...@gmail.com> 
> napisał(a):
> >
> > +1 to the proposal from a CQL perspective
> >
> > However, whether we do this in the context of simple partition restriction, 
> > a global index query, or a partition-restricted index query, the NOT 
> > operator is most likely to be useful only in a post-filtering capacity. 
> > (ex. WHERE indexed_set CONTAINS { 'foo'} AND indexed_set NOT CONTAINS { 
> > 'bar' })
> >
> > Using Lucene as an example, you might remember that it doesn't (at least 
> > IIRC) allow single predicate NOT queries. (See 
> > https://stackoverflow.com/questions/3604771/not-query-in-lucene) It's easy 
> > for an inverted index to find matches efficiently, but not so easy for it 
> > to find non-matches. This is similar to, but even less-straightforward 
> > than, the issue you have w/ boolean queries when you query the less 
> > selective of the two possible values. You can create an accompanying 
> > "negated" index, but that's not free, of course.
> >
> > Again, not necessarily a problem w/ the CEP, but want to call out the 
> > potential complication...
> >
> > On Thu, Apr 6, 2023 at 4:01 PM Jeremy Hanna <jeremy.hanna1...@gmail.com> 
> > wrote:
> >>
> >> Considering all of the examples require using ALLOW FILTERING with the 
> >> partition key specified, I think it's appropriate to consider separating 
> >> out use of ALLOW FILTERING within a partition versus ALLOW FILTERING 
> >> across the whole table.  A few years back we had a discussion about this 
> >> in ASF slack in the context of capability restrictions and it seems 
> >> relevant here.  That is, we don't want people to get comfortable using 
> >> ALLOW FILTERING across the whole table.  However, there are times when 
> >> ALLOW FILTERING within a partition is reasonable.
> >>
> >> Ticket to discuss separating them out: 
> >> https://issues.apache.org/jira/browse/CASSANDRA-15803
> >> Summary: Perhaps add an optional [WITHIN PARTITION] or something similar 
> >> to make it backwards compatible and indicate that this is purely within 
> >> the specified partition.
> >>
> >> This also gives us the ability to disallow table scan types of ALLOW 
> >> FILTERING from a guard rail perspective, because the intent is explicit.  
> >> That operators could disallow ALLOW FILTERING but allow ALLOW FILTERING 
> >> WITHIN PARTITION, or whatever is decided.
> >>
> >> I do NOT want to hijack a good discussion but I thought this separation 
> >> could be useful within this context.
> >>
> >> Jeremy
> >>
> >> On Apr 6, 2023, at 3:00 PM, Patrick McFadin <pmcfa...@gmail.com> wrote:
> >>
> >> I love that this is finally coming to Cassandra. Absolutely hate that, 
> >> once again, we'll be endorsing the use of ALLOW FILTERING. This is an 
> >> anti-pattern that keeps getting legitimized.
> >>
> >> Hot take: Should we just not do Milestones 1 and 2 and wait for an 
> >> index-only Milestone 3?
> >>
> >> Patrick
> >>
> >> On Thu, Apr 6, 2023 at 10:04 AM David Capwell <dcapw...@apple.com> wrote:
> >>>
> >>> Overall I welcome this feature, was trying to use this around 1-2 months 
> >>> back and found we didn’t support, so glad to see it coming!
> >>>
> >>> From a testing point of view, I think we would want to have good fuzz 
> >>> testing covering complex types (frozen/non-frozen collections, tuples, 
> >>> udt, etc.), and reverse ordering; both sections tend to cause the most 
> >>> problem for new features (and existing ones)
> >>>
> >>> We also will want a way to disable this feature, and optionally disable 
> >>> at different sections (such as m2’s NOT IN for partition keys).
> >>>
> >>> > On Apr 4, 2023, at 2:28 AM, Piotr Kołaczkowski <pkola...@datastax.com> 
> >>> > wrote:
> >>> >
> >>> > Hi everyone!
> >>> >
> >>> > I created a new CEP for adding NOT support to the query language and
> >>> > want to start discussion around it:
> >>> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
> >>> >
> >>> > Happy to get your feedback.
> >>> > --
> >>> > Piotr
> >>>
> >>

Reply via email to