A > B > C I don't think that ML is such a niche application that it can't have its own CQL data type. Also, vectors are mathematical elements that have more applications that ML.
On Tue, 2 May 2023 at 19:15, Mick Semb Wever <m...@apache.org> wrote: > > > On Tue, 2 May 2023 at 17:14, Jonathan Ellis <jbel...@gmail.com> wrote: > >> Should we add a vector type to Cassandra designed to meet the needs of >> machine learning use cases, specifically feature and embedding vectors for >> training, inference, and vector search? >> >> ML vectors are fixed-dimension (fixed-length) sequences of numeric types, >> with no nulls allowed, and with no need for random access. The ML industry >> overwhelmingly uses float32 vectors, to the point that the industry-leading >> special-purpose vector database ONLY supports that data type. >> >> This poll is to gauge consensus subsequent to the recent discussion >> thread at >> https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0. >> >> Please rank the discussed options from most preferred option to least, >> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B >> = A (C is my preference, followed by B or A approximately equally.) >> >> (A) I am in favor of adding a vector type for floats; I do not believe we >> need to tie it to any particular implementation details. >> >> (B) I am okay with adding a vector type but I believe we must add array >> types that compose with all Cassandra types first, and make vectors a >> special case of arrays-without-null-elements. >> >> (C) I am not in favor of adding a built-in vector type. >> > > > > A > B > C > > B is stated as "must add array types…". I think this is a bit loaded. If > B was the (A + the implementation needs to be a non-null frozen float32 > array, serialisation forward compatible with other frozen arrays later > implemented) I would put this before (A). Especially because it's been > shown already this is easy to implement. > > >