Re: [DISCUSS] CEP-7 Storage Attached Index

Henrik Ingo Mon, 14 Feb 2022 08:26:25 -0800

On Fri, Feb 11, 2022 at 8:47 PM Caleb Rackliffe <calebrackli...@gmail.com>
wrote:


> Just finished reading the latest version of the CEP. Here are my thoughts:
>
> - We've already talked about OR queries, so I won't rehash that, but
> tokenization support seems like it might be another one of those places
> where we can cut scope if we want to get V1 out the door. It shouldn't be
> that hard to detangle from the rest of the code.
>

The tokenization support is already implemented. It's available in our
public fork but at least last time I was involved, there's not really any
public documentation. Lucene comes with dozens of tokenizers so the
documentation effort will be significant.

So the situation is similar to OR: The community may want to break out a
separate CEP to debate the user facing syntax. Alternatively, this can
simply happen as part of the PR that could be submitted as soon as CEP-7 is
approved.



> - We mention the JMX metric ecosystem in the CEP, but not the related
> virtual tables. This isn't a big issue, and doesn't mean we need to change
> the CEP, but it might be helpful for those not familiar with the existing
> prototype to know they exist :)
>

Thanks for the callout. Maybe they should indeed be mentioned together.


> - It's probably below the line for CEP discussion, but the text and
> numeric index formats will probably change over time. We don't need a whole
> "codec framework" for V1, but we're still embedding some versioning
> information in the column index on-disk structures, right?
>
>
On the contrary, this is a very valid question. As you know SAI has been GA
for over a year in both our DSE and Astra products, and what is described
in CEP-7 to be included in Cassandra is for the SAI team known as V2. (But
to be clear, it's named V1 in the CEP and in the context of Cassandra!) So
the code does contain facilities to support multiple generations of index
formats. If encountering an sstable of the older version, then the relevant
code would be used to read the index files. Upon compaction the newer
version is written. And there needs to be some kind of global check to know
that new features are only available once all sstables cluster wide are of
the required version.


> To offset my obvious partiality around this CEP, I've already made an
> effort to raise some of the issues that may come up to challenge us from a
> macro perspective. It seems like the prevailing opinion here is that they
> are either surmountable or simply basic conceptual difficulties w/
> distributed secondary indexing.
>
>
This might be a good moment to say that we really appreciate your
investment and support in this CEP!

henrik

Re: [DISCUSS] CEP-7 Storage Attached Index

Reply via email to