[ 
https://issues.apache.org/jira/browse/CASSANDRA-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728506#comment-17728506
 ] 

David Capwell commented on CASSANDRA-18504:
-------------------------------------------

bq. But it did seem worthwhile to highlight the difference and make sure that 
this difference in serialization formats represents an explicit choice.

Yes, this was something I explicitly did.  My argument was that the common case 
are vectors of numbers, so by optimizing for this case we save a lot of space 
for these vectors (vector<byte, 1024> is 1,024 bytes with this format, but 
would have been 5,120 if we included size).  This gets even worse if you move 
from a vector to a matrix (vector<vector<byte, 1024>, 1024> would be 1,048,576 
bytes without the header and 20,971,520 with the header); notice that in this 
case vector is fixed length if-and-only-if the element type is fixed length!

One added change I have been thinking about is "fixing" ShortType to be fixed 
length in this code path without changing existing code paths... right now 
ShortType is serialized as int header + 2 byte short in vector type, but also 
in normal SSTable format!  Its actually cheaper for users to store a short as 
an int as that is stored as 4 bytes only... Given this is a new type, I could 
add and use a new method "valueLengthIfFixedNoForRealThisTime" and only fix 
ShortType to return 2 where as valueLengthIfFixed currently returns -1 (aka not 
fixed length)...

> Added support for type VECTOR<type, dimension>
> ----------------------------------------------
>
>                 Key: CASSANDRA-18504
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18504
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Schema, CQL/Syntax
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 7h
>  Remaining Estimate: 0h
>
> Based off several mailing list threads (see "[POLL] Vector type for ML”, 
> "[DISCUSS] New data type for vector search”, and "Adding vector search to SAI 
> with heirarchical navigable small world graph index”), its desirable to add a 
> new type “VECTOR” that has the following properties
> 1) fixed length array
> 2) elements may not be null
> 3) flatten array (aka multi-cell = false)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to