This is not the poll I thought we would be conducting, and I don’t really support its framing. There are two parallel questions: what the functionality should be and how they should be exposed. This poll compresses the optionality poorly.

Whether or not we support a “vector” concept (or something isomorphic with it), the first question this poll wants to answer is:

A) Should we introduce a new CQL collection type that is unique to ML and *only* supports float32
B) Should we introduce a type that is general purpose, and supports all Cassandra types, so that this may be used to support ML (and perhaps other) workloads
C) Should we not introduce new types to CQL at all

For this question, I vote B only.

Once this question is answered it makes sense to answer how it will be exposed semantically/syntactically. 


On 2 May 2023, at 16:43, Jonathan Ellis <jbel...@gmail.com> wrote:


My preference: A > B > C.  Vectors are distinct enough from arrays that we should not make adding the latter a prerequisite for adding the former.

On Tue, May 2, 2023 at 10:13 AM Jonathan Ellis <jbel...@gmail.com> wrote:
Should we add a vector type to Cassandra designed to meet the needs of machine learning use cases, specifically feature and embedding vectors for training, inference, and vector search?  

ML vectors are fixed-dimension (fixed-length) sequences of numeric types, with no nulls allowed, and with no need for random access. The ML industry overwhelmingly uses float32 vectors, to the point that the industry-leading special-purpose vector database ONLY supports that data type.

This poll is to gauge consensus subsequent to the recent discussion thread at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.

Please rank the discussed options from most preferred option to least, e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B = A (C is my preference, followed by B or A approximately equally.)

(A) I am in favor of adding a vector type for floats; I do not believe we need to tie it to any particular implementation details.

(B) I am okay with adding a vector type but I believe we must add array types that compose with all Cassandra types first, and make vectors a special case of arrays-without-null-elements.

(C) I am not in favor of adding a built-in vector type.

--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Reply via email to