Re: [DISCUSS] New data type for vector search

David Capwell Thu, 27 Apr 2023 09:40:15 -0700

> but as you point out it has the problem of allowing nulls.

If nulls are not allowed for the elements, then either we need  a) a new type, 
or b) add some way to say elements may not be null…. As much as I do like b, I 
am leaning towards new type for this use case.

So, to flesh out the type requirements I have seen so far

1) represents a fixed size array of element type
* on write path we will need to validate this
2) element may not be null
* on write path we will need to validate this
3) “frozen” (is this really a requirement for the type or is this just simpler 
for the ANN work?  I feel that this shouldn’t be a requirement)
4) works for all types (my requirement; original proposal is float only, but 
could logically expand to primitive types)

Anything else?

> The key thing about a vector is that unlike lists or tuples you really don't 
> care about individual elements, you care about doing vector and matrix 
> multiplications with the thing as a unit. 

That maybe true for this use case, but “should” this be true for the type 
itself?  I feel like no… if a user wants the Nth element of a vector why would 
we block them?  I am not saying the first patch, or even 5.0 adds support for 
index access, I am just trying to push back saying that the type should not 
block this.

> (Maybe this is making the case for VECTOR FLOAT[N] rather than FLOAT 
> VECTOR[N].)

Now that nulls are not allowed, I have mixed feelings about FLOAT[N], I prefer 
this syntax but that limitation may not be desired for all use cases… we could 
always add LIST<TYPE, N> and ARRAY<TYPE, N> later to address that case.

In terms of syntax I have seen, here is my ordered preference:

1) TYPE[size] - have mixed feelings due to non-null, but still prefer it
2) QUALIFIER TYPE[size] - QUALIFIER is just a Term we use to denote this 
semantic…. Could even be NON NULL TYPE[size]

> On Apr 27, 2023, at 9:00 AM, Benedict <bened...@apache.org> wrote:
> 
> That’s a bounded ring buffer, not a fixed length array.
> 
> This definitely isn’t a tuple because the types are all the same, which is 
> pretty crucial for matrix operations. Matrix libraries generally work on 
> arrays of known dimensionality, or sparse representations.
> 
> Whether we draw any semantic link between the frozen list and whatever we do 
> here, it is fundamentally a frozen list with a restriction on its size. What 
> we’re defining here are “statically” sized arrays, whereas a frozen list is 
> essentially a dynamically sized array.
> 
> I do not think vector is a good name because vector is used in some other 
> popular languages to mean a (dynamic) list, which is confusing when we also 
> have a list concept.
> 
> I’m fine with just using the FLOAT[N] syntax, and drawing no direct link with 
> list. Though it is a bit strange that this particular type declaration looks 
> so different to other collection types.
> 
>> On 27 Apr 2023, at 16:48, Jeff Jirsa <jji...@gmail.com> wrote:
>> 
>> 
>> 
>> 
>> On Thu, Apr 27, 2023 at 7:39 AM Jonathan Ellis <jbel...@gmail.com 
>> <mailto:jbel...@gmail.com>> wrote:
>>> It's been a while, so I may be missing something, but do we already have 
>>> fixed-size lists?  If not, I don't see why we'd try to make this fit into a 
>>> List-shaped problem.
>> 
>> We do not. The proposal got closed as wont-fix  
>> https://issues.apache.org/jira/browse/CASSANDRA-9110
>> 
>>

Re: [DISCUSS] New data type for vector search

Reply via email to