Wow, you know, J.D., I've never actually heard ALLOW FILTERING described as
you did. Generally, the discussion is always in terms of
multiple partitions, probably because that is the situation in which the
memory is exceeded. Thanks for that definition.

Regardless of how this discussion goes, I'll make a ticket to change that
doc.

Lorina

On Thu, Apr 13, 2023 at 4:17 AM J. D. Jordan <jeremiah.jor...@gmail.com>
wrote:

> The documentation is wrong. ALLOW FILTERING has always meant that “rows
> will need to be materialized in memory and accepted or rejected by a column
> filter” aka the full primary key was not specified and some other column
> was specified.  It has never been about multiple partitions.
> Basically “will the server need to read from disk more data (possibly a
> lot more) than will be returned to the client”.
> Should we change how that works? Maybe. But let move such discussions to a
> new thread and keep this one about the CEP proposal.
>
> On Apr 13, 2023, at 6:00 AM, Andrés de la Peña <adelap...@apache.org>
> wrote:
>
> 
> Indeed requiring AF for "select * from ks.tb where p1 = 1 and c1 = 2 and
> col2 = 1", where p1 and c1 are all the columns in the primary key, sounds
> like a bug.
>
> I think the criterion in the code is that we require AF if there is any
> column restriction that cannot be processed by the primary key or a
> secondary index. The error message indeed seems to reject any kind of
> filtering, independently of primary key filters. We can see this even
> without defined clustering keys:
>
> CREATE TABLE t (k int PRIMARY KEY, v int);
> SELECT * FROM  t WHERE  k = 1 AND v = 1; # requires AF
>
> That clashes with documentation, where it's said that AF is required for
> filters that require scanning all partitions. If we were to adapt the code
> to the behaviour described in documentation we shouldn't require AF if
> there are restrictions specifying a partition key. Or possibly a group of
> partition keys, if a IN restriction is used. So both within row and within
> partition filtering wouldn't require AF.
>
> Regarding adding a new ALLOW FILTERING WITHIN PARTITION, I think we could
> just add a guardrail to directly disallow those queries, without needing to
> add the WITHIN PARTITION clause to the CQL grammar.
>
> On Thu, 13 Apr 2023 at 11:11, Henrik Ingo <henrik.i...@datastax.com>
> wrote:
>
>>
>>
>> On Thu, Apr 13, 2023 at 10:20 AM Miklosovic, Stefan <
>> stefan.mikloso...@netapp.com> wrote:
>>
>>> Somebody correct me if I am wrong but "partition key" itself is not
>>> enough (primary keys = partition keys + clustering columns). It will
>>> require ALLOW FILTERING when clustering columns are not specified either.
>>>
>>> create table ks.tb (p1 int, c1 int, col1 int, col2 int, primary key (p1,
>>> c1));
>>> select * from ks.tb where p1 = 1 and col1 = 2;     // this will require
>>> allow filtering
>>>
>>> The documentation seems to omit this fact.
>>>
>>
>> It does seem so.
>>
>> That said, personally I was assuming, and would still argue it's the
>> optimal choice, that the documentation was right and reality is wrong.
>>
>> If there is a partition key, then the query can avoid scanning the entire
>> table, across all nodes, potentially petabytes.
>>
>> If a query specifies a partition key but not the full clustering key, of
>> course there will be some scanning needed, but this is marginal compared to
>> the need to scan the entire table. Even in the worst case, a partition with
>> 2 billion cells, we are talking about seconds to filter the result from the
>> single partition.
>>
>> > Aha I get what you all mean:
>>
>> No, I actually think both are unnecessary. But yeah, certainly this
>> latter case is a bug?
>>
>> henrik
>>
>> --
>>
>> Henrik Ingo
>>
>> c. +358 40 569 7354
>>
>> w. www.datastax.com
>>
>>
>> <https://urldefense.com/v3/__https://www.facebook.com/datastax__;!!PbtH5S7Ebw!bzLpFAS6DKnc6nO_MlovaJulom8kniRyDJvH4YiWGXQ9iZp5R3s9sxg7mwnh8bn-SrbpdxAh6atlHxx6g0zGfNRz$>
>> <https://twitter.com/datastax>
>> <https://urldefense.com/v3/__https://www.linkedin.com/company/datastax/__;!!PbtH5S7Ebw!bzLpFAS6DKnc6nO_MlovaJulom8kniRyDJvH4YiWGXQ9iZp5R3s9sxg7mwnh8bn-SrbpdxAh6atlHxx6gx-l0Y0-$>
>> <https://github.com/datastax/>
>>
>>

Reply via email to