On Thu, Apr 13, 2023 at 10:20 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Somebody correct me if I am wrong but "partition key" itself is not enough
> (primary keys = partition keys + clustering columns). It will require ALLOW
> FILTERING when clustering columns are not specified either.
>
> create table ks.tb (p1 int, c1 int, col1 int, col2 int, primary key (p1,
> c1));
> select * from ks.tb where p1 = 1 and col1 = 2;     // this will require
> allow filtering
>
> The documentation seems to omit this fact.
>

It does seem so.

That said, personally I was assuming, and would still argue it's the
optimal choice, that the documentation was right and reality is wrong.

If there is a partition key, then the query can avoid scanning the entire
table, across all nodes, potentially petabytes.

If a query specifies a partition key but not the full clustering key, of
course there will be some scanning needed, but this is marginal compared to
the need to scan the entire table. Even in the worst case, a partition with
2 billion cells, we are talking about seconds to filter the result from the
single partition.

> Aha I get what you all mean:

No, I actually think both are unnecessary. But yeah, certainly this latter
case is a bug?

henrik

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>

Reply via email to