Re: [DISCUSS] New data type for vector search
If we are going to use FLOAT[N] as sugar for another CQL data type, maybe tuples are more convenient than lists. So FLOAT[N] could be equivalent to TUPLE. Differently to collections, tuples have a fixed size, they are always frozen and I think they don't support random access. These properties seem desirable for vectors. Tuples however support null values, whereas collections doesn't. I mean, you can remove elements from a collection, but I think you are never going to see an explicit null in the collection. Tuples don't allow to remove a value, but the entire tuple can be written with null values. Like in INSERT INTO t (key, tuple) VALUES (0, (1, null, 3)). On Wed, 26 Apr 2023 at 21:53, Mick Semb Wever wrote: > My inclination then would be to say you declare an ARRAY (which >> is semantic sugar for FROZEN>). This is very consistent with >> our existing style. We then simply permit such columns to define ANN >> indexes. >> > > > So long as nulls aren't a problem as David questions, an alternative is: > > FLOAT[N] as semantic sugar for LIST > > And ANN requiring FROZEN > > Maybe taking a poll in a few days will be positive to keep this > moving forward. >
Re: [DISCUSS] New data type for vector search
> > My inclination then would be to say you declare an ARRAY (which > is semantic sugar for FROZEN>). This is very consistent with > our existing style. We then simply permit such columns to define ANN > indexes. > So long as nulls aren't a problem as David questions, an alternative is: FLOAT[N] as semantic sugar for LIST And ANN requiring FROZEN Maybe taking a poll in a few days will be positive to keep this moving forward.
Re: Adding vector search to SAI with heirarchical navigable small world graph index
If we look to postgresql it allows defining arrays using FLOAT[N] or FLOAT ARRAY[N]. So that is an extra point for me to just using FLOAT[N]. From my quick search neither oracle* nor MySQL directly support arrays in columns. * oracle supports declaring a custom type using VARRAY and then using that type for a column. CREATE TYPE float_array AS VARRAY(100) OF FLOAT; > On Apr 26, 2023, at 12:17 PM, David Capwell wrote: > > >> >> DENSE seems to just be an array? So very similar to a frozen list, but with >> a fixed size? > > How I read the doc, DENSE = ARRAY, but knew that couldn’t be the case, so > when I read the code its fixed size array…. So the real syntax was “DENSE > FLOAT32[42]” > > Not a fan of the type naming, and feel that a fixed size array could be > useful for other cases as well, so think we can improve here (personally > prefer float[42], text[42], etc… vector maybe closer to our > existing syntax but not a fan). > >> I guess this is an excellent example to explore the minima of what >> constitutes a CEP > > The ANN change itself feels like a CEP makes sense. Are we going to depend > on Lucene’s HNSW or build our own? How do we validate this for correctness? > What does correctness mean in a distributed context? Is this going to be > pluggable (big push recently to offer plugability)? > > >> On Apr 26, 2023, at 7:37 AM, Patrick McFadin wrote: >> >> I guess this is an excellent example to explore the minima of what >> constitutes a CEP. So far, CEPs have been some large changes, so where does >> something like this fit? (Wait. Did I beat Benedict to a Bike Shed? I think >> I did.) >> >> This is a list of everything needed for a CEP: >> >> Status >> Scope >> Goals >> Approach >> Timeline >> Mailing list / Slack channels >> Related JIRA tickets >> Motivation >> Audience >> Proposed Changes >> New or Changed Public Interfaces >> Compatibility, Deprecation, and Migration Plan >> Test Plan >> Rejected Alternatives >> >> This is a big enough change to provide information for each element. Going >> back to the spirit of why we started CEPs, we wanted to avoid a mega-commit >> without some shaping and agreement before code goes into trunk. I don't have >> a clear indication of where that line lies. From our own wiki: "It is highly >> recommended to pursue a CEP for significant user-facing or changes that cut >> across multiple subsystems." That seems to fit here. Part of my motivation >> is being clear with potential new contributors by example and encouraging >> more awesomeness. >> >> The changes for operators: >> - New drivers >> - New gaurdrails? >> - Indexing == storage requirements >> >> Patrick >> >> On Tue, Apr 25, 2023 at 10:53 PM Mick Semb Wever wrote: >> I was soo happy when I saw this, I know many users are going to be >> thrilled about it. >> >> >> On Wed, 26 Apr 2023 at 05:15, Patrick McFadin wrote: >> Not sure if this is what you are saying, Josh, but I believe this needs to >> be its own CEP. It's a change in CQL syntax and changes how clusters >> operate. The change needs to be documented and voted on. Jonathan, you know >> how to find me if you want me to help write it. :) >> >> I'd be fine with just a DISCUSS thread to agree to the CQL change, since it: >> `DENSE FLOAT32` appears to be a minimal, and the overall patch building on >> SAI. As Henrik mentioned there's other SAI extensions being added too >> without CEPs. Can you elaborate on how you see this changing how the >> cluster operates? >> >> This will be easier to decide once we have a patch to look at, but that >> depends on a CEP-7 base (e.g. no feature branch exists). If we do want a CEP >> we need to allow a few weeks to get it through, but that can happen in >> parallel and maybe drafting up something now will be valuable anyway for an >> eventual CEP that proposes the more complete features (e.g. >> cosine_similarity(…)). >> >
Re: [DISCUSS] New data type for vector search
Benedicts comments also makes me question; can any of the values in the vector be null? The patch sent works with float arrays, so null isn’t possible… is null not valid for a vector type? If so this would help justify why is a vector not a array or a list (both allow null) > On Apr 26, 2023, at 10:50 AM, David Capwell wrote: > > Thanks for starting this thread! > >> In the initial commits and thread, this was DENSE FLOAT32. Nobody really >> loved that, so we considered a bunch of alternatives, including >> >> - `FLOAT[N]`: This minimal option resembles C and Java array syntax, which >> would make it familiar for many users. However, this syntax raises the >> question of why arrays cannot be created for other types. Additionally, the >> expectation for an array is to provide random access to its contents, which >> is not supported for vectors. >> - `DENSE FLOAT[N]`: This option clarifies that we are supporting dense >> vectors, not sparse ones. However, since Lucene had sparse vector support in >> the past but removed it for lack of compelling use cases, it is unlikely >> that it will be added back, making the "DENSE" qualifier less relevant. >> - `DENSE FLOAT VECTOR[N]`: This is the most verbose option and aligns with >> the CQL/SQL spirit. However, the "DENSE" qualifier is unnecessary for the >> reasons mentioned above. >> - `VECTOR FLOAT[N]`: This option omits the "DENSE" qualifier, but has a less >> natural word order. >> `VECTOR`: This follows the syntax of our Collections, but again >> this would imply that random access is supported, which we want to avoid >> doing. >> - `VECTOR[N]`: This syntax is not very clear about the vector's contents and >> could make it difficult to add other vector types, such as byte vectors >> (already supported by Lucene), in the future. > > I didn’t look close enough when I saw your patch, is this type multicell or > not? Aka is this acting like a frozen> of fixed size? I had > assumed its non-multicell…. Main reason I ask this now is this pushback for > random access…. Lets say I have the following table > > CREATE TABLE fluffy_kittens ( > pk int PRIMARY KEY, > vector FLOAT[42] — don’t ask why fluffy kittens need a vector, they just do! > ) > > If I do the following query, I would expect it to work > > SELECT vector[7] FROM fluffy_kittens WHERE pk=0; — 7 is less than 42 > > While working on accord’s CQL integration Caleb and I kept getting bitten by > frozen vs non frozen behavior, so many cases just stopped working on frozen > collections and should be easy to add (we force user to load the full value > already, why can we not touch it?). > > Now, back to the random access comment, assuming this is not multicell why > would random access be blocked? If the type isValueLengthFixed() == true > then random access should be simple (else it does require walking the array > in-order or to fully deserialize the BB (if working with Lucene I assume we > already deserialized out of BB)). I am just trying to flesh out if there is > a limitation not being brought up or is this trying to limit the scope of > access for easier testing? > >> However, this syntax raises the question of why arrays cannot be created for >> other types > > Left this comment in the other thread, why not? This could be useful outside > the float use case, so having a new "VectorType(AbstractType elements, int > size)” is easier/better than a float only version. I also did a lot of work > to fuzz test our type system, so just adding that into the existing generator > would get good coverage right off the bat (have another fuzz tester I have > not contributed yet, it was done for Accord… it fuzz tests the AST, so would > be easy to add this there, that would test type specific access, which the > existing tests don’t) > >> Finally, the original qualifier of 32 in `FLOAT32` was intended to allow >> consistency if we add other float types like FLOAT16 or FLOAT64 > > I do not think we should add a new FLOAT32 type, but I am cool with an alias > that has FLOAT32 point to FLOAT. One negative of this is that the code paths > where we return schema back to users would do FLOAT even if user wrote > FLOAT32… other than that negative I don’t see any other problems. > >> Thus, we believe that `FLOAT VECTOR[N_DIMENSIONS]` provides the best balance >> of clarity, conciseness, and extensibility. It is more natural in its word >> order than the original proposal and avoids unnecessary qualifiers, while >> still being clear about the data type it represents. Finally, this syntax is >> straighforwardly extensible should we choose to support other vector types >> in the future. > > My preference is TYPE[n_dimension] but I am ok with this syntax if others > prefer it. I don’t agree that this extra verbosity adds more clarity, there > seems to be an assumption that this will tell users that random access isn’t > allowed and only blessed types are
Re: [DISCUSS] New data type for vector search
Thanks for starting this thread! > In the initial commits and thread, this was DENSE FLOAT32. Nobody really > loved that, so we considered a bunch of alternatives, including > > - `FLOAT[N]`: This minimal option resembles C and Java array syntax, which > would make it familiar for many users. However, this syntax raises the > question of why arrays cannot be created for other types. Additionally, the > expectation for an array is to provide random access to its contents, which > is not supported for vectors. > - `DENSE FLOAT[N]`: This option clarifies that we are supporting dense > vectors, not sparse ones. However, since Lucene had sparse vector support in > the past but removed it for lack of compelling use cases, it is unlikely that > it will be added back, making the "DENSE" qualifier less relevant. > - `DENSE FLOAT VECTOR[N]`: This is the most verbose option and aligns with > the CQL/SQL spirit. However, the "DENSE" qualifier is unnecessary for the > reasons mentioned above. > - `VECTOR FLOAT[N]`: This option omits the "DENSE" qualifier, but has a less > natural word order. > `VECTOR`: This follows the syntax of our Collections, but again > this would imply that random access is supported, which we want to avoid > doing. > - `VECTOR[N]`: This syntax is not very clear about the vector's contents and > could make it difficult to add other vector types, such as byte vectors > (already supported by Lucene), in the future. I didn’t look close enough when I saw your patch, is this type multicell or not? Aka is this acting like a frozen> of fixed size? I had assumed its non-multicell…. Main reason I ask this now is this pushback for random access…. Lets say I have the following table CREATE TABLE fluffy_kittens ( pk int PRIMARY KEY, vector FLOAT[42] — don’t ask why fluffy kittens need a vector, they just do! ) If I do the following query, I would expect it to work SELECT vector[7] FROM fluffy_kittens WHERE pk=0; — 7 is less than 42 While working on accord’s CQL integration Caleb and I kept getting bitten by frozen vs non frozen behavior, so many cases just stopped working on frozen collections and should be easy to add (we force user to load the full value already, why can we not touch it?). Now, back to the random access comment, assuming this is not multicell why would random access be blocked? If the type isValueLengthFixed() == true then random access should be simple (else it does require walking the array in-order or to fully deserialize the BB (if working with Lucene I assume we already deserialized out of BB)). I am just trying to flesh out if there is a limitation not being brought up or is this trying to limit the scope of access for easier testing? > However, this syntax raises the question of why arrays cannot be created for > other types Left this comment in the other thread, why not? This could be useful outside the float use case, so having a new "VectorType(AbstractType elements, int size)” is easier/better than a float only version. I also did a lot of work to fuzz test our type system, so just adding that into the existing generator would get good coverage right off the bat (have another fuzz tester I have not contributed yet, it was done for Accord… it fuzz tests the AST, so would be easy to add this there, that would test type specific access, which the existing tests don’t) > Finally, the original qualifier of 32 in `FLOAT32` was intended to allow > consistency if we add other float types like FLOAT16 or FLOAT64 I do not think we should add a new FLOAT32 type, but I am cool with an alias that has FLOAT32 point to FLOAT. One negative of this is that the code paths where we return schema back to users would do FLOAT even if user wrote FLOAT32… other than that negative I don’t see any other problems. > Thus, we believe that `FLOAT VECTOR[N_DIMENSIONS]` provides the best balance > of clarity, conciseness, and extensibility. It is more natural in its word > order than the original proposal and avoids unnecessary qualifiers, while > still being clear about the data type it represents. Finally, this syntax is > straighforwardly extensible should we choose to support other vector types in > the future. My preference is TYPE[n_dimension] but I am ok with this syntax if others prefer it. I don’t agree that this extra verbosity adds more clarity, there seems to be an assumption that this will tell users that random access isn’t allowed and only blessed types are allowed… both points I feel are not valid (or not seen anything published why they should be valid). There is a difference between what a type “could” do and what we implement day 1, I wouldn’t want to add more verbosity because of intentions of the day 1 implementation. > On Apr 26, 2023, at 7:31 AM, Jonathan Ellis wrote: > > Hi all, > > Splitting this out per the suggestion in the initial VS thread so we can work > on driver support in parallel with the
Re: [DISCUSS] New data type for vector search
I think we need to briefly step back and think about what the syntax means and how it fits into existing syntax.It seems that the dimensionality verbiage assumes we’re logically introducing N vector fields, so that each row adopts a value for all of the vector fields or none. But in practice we are actually introducing a fixed-length frozen list in Cassandra terms, and our API treats this as a per-row array/vector rather than a number of column vectors.My inclination then would be to say you declare an ARRAY (which is semantic sugar for FROZEN>). This is very consistent with our existing style. We then simply permit such columns to define ANN indexes.Otherwise, I think we should lean into the idea that this is a set of N vectors, as “dimensions" makes limited sense when discussing an array length. In this case I would lean towards declaring e.g. 1500 FLOAT VECTORS, maybe. But then I think we should reconsider our presentation a little, and perhaps the result set should treat each vector as a separate field (or something like this).On 26 Apr 2023, at 15:31, Jonathan Ellis wrote:Hi all,Splitting this out per the suggestion in the initial VS thread so we can work on driver support in parallel with the server-side changes.I propose adding a new data type for vector search indexes:FLOAT VECTOR[N_DIMENSIONS]In the initial commits and thread, this was DENSE FLOAT32. Nobody really loved that, so we considered a bunch of alternatives, including- `FLOAT[N]`: This minimal option resembles C and Java array syntax, which would make it familiar for many users. However, this syntax raises the question of why arrays cannot be created for other types. Additionally, the expectation for an array is to provide random access to its contents, which is not supported for vectors.- `DENSE FLOAT[N]`: This option clarifies that we are supporting dense vectors, not sparse ones. However, since Lucene had sparse vector support in the past but removed it for lack of compelling use cases, it is unlikely that it will be added back, making the "DENSE" qualifier less relevant.- `DENSE FLOAT VECTOR[N]`: This is the most verbose option and aligns with the CQL/SQL spirit. However, the "DENSE" qualifier is unnecessary for the reasons mentioned above.- `VECTOR FLOAT[N]`: This option omits the "DENSE" qualifier, but has a less natural word order.`VECTOR`: This follows the syntax of our Collections, but again this would imply that random access is supported, which we want to avoid doing.- `VECTOR[N]`: This syntax is not very clear about the vector's contents and could make it difficult to add other vector types, such as byte vectors (already supported by Lucene), in the future.Finally, the original qualifier of 32 in `FLOAT32` was intended to allow consistency if we add other float types like FLOAT16 or FLOAT64, both of which are sometimes used in ML. However, we already have a CQL data type for a 64-bit float (`DOUBLE`), so it would make more sense to add future variants (which remain hypothetical at this point) along that line instead.Thus, we believe that `FLOAT VECTOR[N_DIMENSIONS]` provides the best balance of clarity, conciseness, and extensibility. It is more natural in its word order than the original proposal and avoids unnecessary qualifiers, while still being clear about the data type it represents. Finally, this syntax is straighforwardly extensible should we choose to support other vector types in the future.-- Jonathan Ellisco-founder, http://www.datastax.com@spyced
Re: Adding vector search to SAI with heirarchical navigable small world graph index
> DENSE seems to just be an array? So very similar to a frozen list, but with a > fixed size? How I read the doc, DENSE = ARRAY, but knew that couldn’t be the case, so when I read the code its fixed size array…. So the real syntax was “DENSE FLOAT32[42]” Not a fan of the type naming, and feel that a fixed size array could be useful for other cases as well, so think we can improve here (personally prefer float[42], text[42], etc… vector maybe closer to our existing syntax but not a fan). > I guess this is an excellent example to explore the minima of what > constitutes a CEP The ANN change itself feels like a CEP makes sense. Are we going to depend on Lucene’s HNSW or build our own? How do we validate this for correctness? What does correctness mean in a distributed context? Is this going to be pluggable (big push recently to offer plugability)? > On Apr 26, 2023, at 7:37 AM, Patrick McFadin wrote: > > I guess this is an excellent example to explore the minima of what > constitutes a CEP. So far, CEPs have been some large changes, so where does > something like this fit? (Wait. Did I beat Benedict to a Bike Shed? I think I > did.) > > This is a list of everything needed for a CEP: > > Status > Scope > Goals > Approach > Timeline > Mailing list / Slack channels > Related JIRA tickets > Motivation > Audience > Proposed Changes > New or Changed Public Interfaces > Compatibility, Deprecation, and Migration Plan > Test Plan > Rejected Alternatives > > This is a big enough change to provide information for each element. Going > back to the spirit of why we started CEPs, we wanted to avoid a mega-commit > without some shaping and agreement before code goes into trunk. I don't have > a clear indication of where that line lies. From our own wiki: "It is highly > recommended to pursue a CEP for significant user-facing or changes that cut > across multiple subsystems." That seems to fit here. Part of my motivation is > being clear with potential new contributors by example and encouraging more > awesomeness. > > The changes for operators: > - New drivers > - New gaurdrails? > - Indexing == storage requirements > > Patrick > > On Tue, Apr 25, 2023 at 10:53 PM Mick Semb Wever wrote: > I was soo happy when I saw this, I know many users are going to be > thrilled about it. > > > On Wed, 26 Apr 2023 at 05:15, Patrick McFadin wrote: > Not sure if this is what you are saying, Josh, but I believe this needs to be > its own CEP. It's a change in CQL syntax and changes how clusters operate. > The change needs to be documented and voted on. Jonathan, you know how to > find me if you want me to help write it. :) > > I'd be fine with just a DISCUSS thread to agree to the CQL change, since it: > `DENSE FLOAT32` appears to be a minimal, and the overall patch building on > SAI. As Henrik mentioned there's other SAI extensions being added too without > CEPs. Can you elaborate on how you see this changing how the cluster > operates? > > This will be easier to decide once we have a patch to look at, but that > depends on a CEP-7 base (e.g. no feature branch exists). If we do want a CEP > we need to allow a few weeks to get it through, but that can happen in > parallel and maybe drafting up something now will be valuable anyway for an > eventual CEP that proposes the more complete features (e.g. > cosine_similarity(…)). >
Re: Adding vector search to SAI with heirarchical navigable small world graph index
I guess this is an excellent example to explore the minima of what constitutes a CEP. So far, CEPs have been some large changes, so where does something like this fit? (Wait. Did I beat Benedict to a Bike Shed? I think I did.) This is a list of everything needed for a CEP: Status Scope Goals Approach Timeline Mailing list / Slack channels Related JIRA tickets Motivation Audience Proposed Changes New or Changed Public Interfaces Compatibility, Deprecation, and Migration Plan Test Plan Rejected Alternatives This is a big enough change to provide information for each element. Going back to the spirit of why we started CEPs, we wanted to avoid a mega-commit without some shaping and agreement before code goes into trunk. I don't have a clear indication of where that line lies. From our own wiki: "It is highly recommended to pursue a CEP for significant user-facing or changes that cut across multiple subsystems." That seems to fit here. Part of my motivation is being clear with potential new contributors by example and encouraging more awesomeness. The changes for operators: - New drivers - New gaurdrails? - Indexing == storage requirements Patrick On Tue, Apr 25, 2023 at 10:53 PM Mick Semb Wever wrote: > I was soo happy when I saw this, I know many users are going to be > thrilled about it. > > > On Wed, 26 Apr 2023 at 05:15, Patrick McFadin wrote: > >> Not sure if this is what you are saying, Josh, but I believe this needs >> to be its own CEP. It's a change in CQL syntax and changes how clusters >> operate. The change needs to be documented and voted on. Jonathan, you know >> how to find me if you want me to help write it. :) >> > > I'd be fine with just a DISCUSS thread to agree to the CQL change, since > it: `DENSE FLOAT32` appears to be a minimal, and the overall patch > building on SAI. As Henrik mentioned there's other SAI extensions being > added too without CEPs. Can you elaborate on how you see this changing how > the cluster operates? > > This will be easier to decide once we have a patch to look at, but that > depends on a CEP-7 base (e.g. no feature branch exists). If we do want a > CEP we need to allow a few weeks to get it through, but that can happen in > parallel and maybe drafting up something now will be valuable anyway for an > eventual CEP that proposes the more complete features (e.g. > cosine_similarity(…)). > > >
[DISCUSS] New data type for vector search
Hi all, Splitting this out per the suggestion in the initial VS thread so we can work on driver support in parallel with the server-side changes. I propose adding a new data type for vector search indexes: FLOAT VECTOR[N_DIMENSIONS] In the initial commits and thread, this was DENSE FLOAT32. Nobody really loved that, so we considered a bunch of alternatives, including - `FLOAT[N]`: This minimal option resembles C and Java array syntax, which would make it familiar for many users. However, this syntax raises the question of why arrays cannot be created for other types. Additionally, the expectation for an array is to provide random access to its contents, which is not supported for vectors. - `DENSE FLOAT[N]`: This option clarifies that we are supporting dense vectors, not sparse ones. However, since Lucene had sparse vector support in the past but removed it for lack of compelling use cases, it is unlikely that it will be added back, making the "DENSE" qualifier less relevant. - `DENSE FLOAT VECTOR[N]`: This is the most verbose option and aligns with the CQL/SQL spirit. However, the "DENSE" qualifier is unnecessary for the reasons mentioned above. - `VECTOR FLOAT[N]`: This option omits the "DENSE" qualifier, but has a less natural word order. `VECTOR`: This follows the syntax of our Collections, but again this would imply that random access is supported, which we want to avoid doing. - `VECTOR[N]`: This syntax is not very clear about the vector's contents and could make it difficult to add other vector types, such as byte vectors (already supported by Lucene), in the future. Finally, the original qualifier of 32 in `FLOAT32` was intended to allow consistency if we add other float types like FLOAT16 or FLOAT64, both of which are sometimes used in ML. However, we already have a CQL data type for a 64-bit float (`DOUBLE`), so it would make more sense to add future variants (which remain hypothetical at this point) along that line instead. Thus, we believe that `FLOAT VECTOR[N_DIMENSIONS]` provides the best balance of clarity, conciseness, and extensibility. It is more natural in its word order than the original proposal and avoids unnecessary qualifiers, while still being clear about the data type it represents. Finally, this syntax is straighforwardly extensible should we choose to support other vector types in the future. -- Jonathan Ellis co-founder, http://www.datastax.com @spyced
Re: [EXTERNAL] Re: (CVE only) support for 3,11 beyond published EOL
On Sat, 15 Apr 2023 at 03:17, C. Scott Andreas wrote: > If there’s lack of clarity around EOL policy and dates, we should > absolutely make this clear. > Fix is here: https://github.com/thelastpickle/cassandra-website/tree/mck/update-5-0_dates_download_page w/ html generated here: https://raw.githack.com/thelastpickle/cassandra-website/mck/update-5-0_dates_download_page_generated/content/_/download.html I'll merge this tomorrow if there's no further input.
Re: Adding vector search to SAI with heirarchical navigable small world graph index
We probably at least need to bike shed naming as we already have FLOAT, DOUBLE, and LIST - which are similar/overlapping types, and we shoo on should be consistent.If we introduce FLOAT32 we probably need that to be an alias of FLOAT and introduce FLOAT64 to alias DOUBLE for consistency.DENSE seems to just be an array? So very similar to a frozen list, but with a fixed size?On 26 Apr 2023, at 06:53, Mick Semb Wever wrote:I was soo happy when I saw this, I know many users are going to be thrilled about it.On Wed, 26 Apr 2023 at 05:15, Patrick McFadinwrote:Not sure if this is what you are saying, Josh, but I believe this needs to be its own CEP. It's a change in CQL syntax and changes how clusters operate. The change needs to be documented and voted on. Jonathan, you know how to find me if you want me to help write it. :) I'd be fine with just a DISCUSS thread to agree to the CQL change, since it: `DENSE FLOAT32` appears to be a minimal, and the overall patch building on SAI. As Henrik mentioned there's other SAI extensions being added too without CEPs. Can you elaborate on how you see this changing how the cluster operates?This will be easier to decide once we have a patch to look at, but that depends on a CEP-7 base (e.g. no feature branch exists). If we do want a CEP we need to allow a few weeks to get it through, but that can happen in parallel and maybe drafting up something now will be valuable anyway for an eventual CEP that proposes the more complete features (e.g. cosine_similarity(…)).