+1    On Thursday, April 27, 2023 at 07:36:19 PM PDT, Caleb Rackliffe 
<calebrackli...@gmail.com> wrote:  
 
 I don’t have a lot to add here, other than to say I’m broadly in agreement w/ 
David on syntax preference, element selectability, and making this a new type 
that roughly corresponds to a primitive (non-null-allowing) array.


On Apr 27, 2023, at 9:18 PM, Anthony Grasso <anth...@apache.org> wrote:



It would be strange for this declaration to look different from other 
collection types. We may want to reconsider using the collection syntax. I also 
like the idea of the vector dimensions being declared with the VECTOR keyword. 
An alternative syntax option to explore is:
VECTOR[size]<TYPE>
On Fri, 28 Apr 2023 at 10:49, Josh McKenzie <jmcken...@apache.org> wrote:

>From a machine learning perspective, vectors are a well-known concept that are 
>effectively immutable fixed-length n-dimensional values that are then later 
>used either as part of a model or in conjunction with a model after the fact.

While we could have this be non-frozen and not call it a vector, I'd be 
inclined to still make the argument for a layer of syntactic sugar on top that 
met ML users where they were with concepts they understood rather than forcing 
them through the cognitive lift of figuring out the Cassandra specific 
contortions to replicate something that's ubiquitous in their space. We did the 
same "Cassandra-first" approach with our JSON support and that didn't do us any 
favors in terms of adoption and usage as far as I know.

So is the goal here to provide something specific and idiomatic for the ML 
community or is the goal to make a primitive that's C*-centric that then 
another layer can write to? I personally argue for the former; I don't see this 
specific data type going away any time soon.
On Thu, Apr 27, 2023, at 12:39 PM, David Capwell wrote:


but as you point out it has the problem of allowing nulls.


If nulls are not allowed for the elements, then either we need  a) a new type, 
or b) add some way to say elements may not be null…. As much as I do like b, I 
am leaning towards new type for this use case.

So, to flesh out the type requirements I have seen so far

1) represents a fixed size array of element type
* on write path we will need to validate this
2) element may not be null
* on write path we will need to validate this
3) “frozen” (is this really a requirement for the type or is this just simpler 
for the ANN work?  I feel that this shouldn’t be a requirement)
4) works for all types (my requirement; original proposal is float only, but 
could logically expand to primitive types)

Anything else?


The key thing about a vector is that unlike lists or tuples you really don't 
care about individual elements, you care about doing vector and matrix 
multiplications with the thing as a unit. 


That maybe true for this use case, but “should” this be true for the type 
itself?  I feel like no… if a user wants the Nth element of a vector why would 
we block them?  I am not saying the first patch, or even 5.0 adds support for 
index access, I am just trying to push back saying that the type should not 
block this.


(Maybe this is making the case for VECTOR FLOAT[N] rather than FLOAT VECTOR[N].)


Now that nulls are not allowed, I have mixed feelings about FLOAT[N], I prefer 
this syntax but that limitation may not be desired for all use cases… we could 
always add LIST<TYPE, N> and ARRAY<TYPE, N> later to address that case.

In terms of syntax I have seen, here is my ordered preference:

1) TYPE[size] - have mixed feelings due to non-null, but still prefer it
2) QUALIFIER TYPE[size] - QUALIFIER is just a Term we use to denote this 
semantic…. Could even be NON NULL TYPE[size]


On Apr 27, 2023, at 9:00 AM, Benedict <bened...@apache.org> wrote:


That’s a bounded ring buffer, not a fixed length array.

This definitely isn’t a tuple because the types are all the same, which is 
pretty crucial for matrix operations. Matrix libraries generally work on arrays 
of known dimensionality, or sparse representations.

Whether we draw any semantic link between the frozen list and whatever we do 
here, it is fundamentally a frozen list with a restriction on its size. What 
we’re defining here are “statically” sized arrays, whereas a frozen list is 
essentially a dynamically sized array.

I do not think vector is a good name because vector is used in some other 
popular languages to mean a (dynamic) list, which is confusing when we also 
have a list concept.

I’m fine with just using the FLOAT[N] syntax, and drawing no direct link with 
list. Though it is a bit strange that this particular type declaration looks so 
different to other collection types.


On 27 Apr 2023, at 16:48, Jeff Jirsa <jji...@gmail.com> wrote:





On Thu, Apr 27, 2023 at 7:39 AM Jonathan Ellis <jbel...@gmail.com> wrote:

It's been a while, so I may be missing something, but do we already have 
fixed-size lists?  If not, I don't see why we'd try to make this fit into a 
List-shaped problem.


We do not. The proposal got closed as wont-fix  
https://issues.apache.org/jira/browse/CASSANDRA-9110








  

Reply via email to