It would be strange for this declaration to look different from other collection types. We may want to reconsider using the collection syntax. I also like the idea of the vector dimensions being declared with the VECTOR keyword. An alternative syntax option to explore is:
VECTOR[size]<TYPE> On Fri, 28 Apr 2023 at 10:49, Josh McKenzie <jmcken...@apache.org> wrote: > From a machine learning perspective, vectors are a well-known concept that > are effectively immutable fixed-length n-dimensional values that are then > later used either as part of a model or in conjunction with a model after > the fact. > > While we could have this be non-frozen and not call it a vector, I'd be > inclined to still make the argument for a layer of syntactic sugar on top > that met ML users where they were with concepts they understood rather than > forcing them through the cognitive lift of figuring out the Cassandra > specific contortions to replicate something that's ubiquitous in their > space. We did the same "Cassandra-first" approach with our JSON support and > that didn't do us any favors in terms of adoption and usage as far as I > know. > > So is the goal here to provide something specific and idiomatic for the ML > community or is the goal to make a primitive that's C*-centric that then > another layer can write to? I personally argue for the former; I don't see > this specific data type going away any time soon. > > On Thu, Apr 27, 2023, at 12:39 PM, David Capwell wrote: > > but as you point out it has the problem of allowing nulls. > > > If nulls are not allowed for the elements, then either we need a) a new > type, or b) add some way to say elements may not be null…. As much as I do > like b, I am leaning towards new type for this use case. > > So, to flesh out the type requirements I have seen so far > > 1) represents a fixed size array of element type > * on write path we will need to validate this > 2) element may not be null > * on write path we will need to validate this > 3) “frozen” (is this really a requirement for the type or is this > just simpler for the ANN work? I feel that this shouldn’t be a requirement) > 4) works for all types (my requirement; original proposal is float only, > but could logically expand to primitive types) > > Anything else? > > The key thing about a vector is that unlike lists or tuples you really > don't care about individual elements, you care about doing vector and > matrix multiplications with the thing as a unit. > > > That maybe true for this use case, but “should” this be true for the type > itself? I feel like no… if a user wants the Nth element of a vector why > would we block them? I am not saying the first patch, or even 5.0 adds > support for index access, I am just trying to push back saying that the > type should not block this. > > (Maybe this is making the case for VECTOR FLOAT[N] rather than FLOAT > VECTOR[N].) > > > Now that nulls are not allowed, I have mixed feelings about FLOAT[N], I > prefer this syntax but that limitation may not be desired for all use > cases… we could always add LIST<TYPE, N> and ARRAY<TYPE, N> later > to address that case. > > In terms of syntax I have seen, here is my ordered preference: > > 1) TYPE[size] - have mixed feelings due to non-null, but still prefer it > 2) QUALIFIER TYPE[size] - QUALIFIER is just a Term we use to denote this > semantic…. Could even be NON NULL TYPE[size] > > On Apr 27, 2023, at 9:00 AM, Benedict <bened...@apache.org> wrote: > > > That’s a bounded ring buffer, not a fixed length array. > > This definitely isn’t a tuple because the types are all the same, which is > pretty crucial for matrix operations. Matrix libraries generally work on > arrays of known dimensionality, or sparse representations. > > Whether we draw any semantic link between the frozen list and whatever we > do here, it is fundamentally a frozen list with a restriction on its size. > What we’re defining here are “statically” sized arrays, whereas a frozen > list is essentially a dynamically sized array. > > I do not think vector is a good name because vector is used in some other > popular languages to mean a (dynamic) list, which is confusing when we also > have a list concept. > > I’m fine with just using the FLOAT[N] syntax, and drawing no direct link > with list. Though it is a bit strange that this particular type declaration > looks so different to other collection types. > > On 27 Apr 2023, at 16:48, Jeff Jirsa <jji...@gmail.com> wrote: > > > > > On Thu, Apr 27, 2023 at 7:39 AM Jonathan Ellis <jbel...@gmail.com> wrote: > > It's been a while, so I may be missing something, but do we already have > fixed-size lists? If not, I don't see why we'd try to make this fit into a > List-shaped problem. > > > We do not. The proposal got closed as wont-fix > https://issues.apache.org/jira/browse/CASSANDRA-9110 > > > >