[ https://issues.apache.org/jira/browse/CASSANDRA-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728506#comment-17728506 ]
David Capwell commented on CASSANDRA-18504: ------------------------------------------- bq. But it did seem worthwhile to highlight the difference and make sure that this difference in serialization formats represents an explicit choice. Yes, this was something I explicitly did. My argument was that the common case are vectors of numbers, so by optimizing for this case we save a lot of space for these vectors (vector<byte, 1024> is 1,024 bytes with this format, but would have been 5,120 if we included size). This gets even worse if you move from a vector to a matrix (vector<vector<byte, 1024>, 1024> would be 1,048,576 bytes without the header and 20,971,520 with the header); notice that in this case vector is fixed length if-and-only-if the element type is fixed length! One added change I have been thinking about is "fixing" ShortType to be fixed length in this code path without changing existing code paths... right now ShortType is serialized as int header + 2 byte short in vector type, but also in normal SSTable format! Its actually cheaper for users to store a short as an int as that is stored as 4 bytes only... Given this is a new type, I could add and use a new method "valueLengthIfFixedNoForRealThisTime" and only fix ShortType to return 2 where as valueLengthIfFixed currently returns -1 (aka not fixed length)... > Added support for type VECTOR<type, dimension> > ---------------------------------------------- > > Key: CASSANDRA-18504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18504 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Schema, CQL/Syntax > Reporter: David Capwell > Assignee: David Capwell > Priority: Normal > Fix For: 5.x > > Time Spent: 7h > Remaining Estimate: 0h > > Based off several mailing list threads (see "[POLL] Vector type for ML”, > "[DISCUSS] New data type for vector search”, and "Adding vector search to SAI > with heirarchical navigable small world graph index”), its desirable to add a > new type “VECTOR” that has the following properties > 1) fixed length array > 2) elements may not be null > 3) flatten array (aka multi-cell = false) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org