Wail Y. Alkowaileet created ASTERIXDB-3212:
----------------------------------------------

             Summary: Account for offset sizes for variable-length PKs in 
columnar datasets
                 Key: ASTERIXDB-3212
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3212
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: STO - Storage
    Affects Versions: 0.9.9
            Reporter: Wail Y. Alkowaileet
            Assignee: Wail Y. Alkowaileet
             Fix For: 0.9.9


After we moved from encoded PKs to plain PKs, fixed-length vs. variable-length 
values are now stored differently. For fixed-length PKs (e.g., BIGINT), we 
store each primary key one after the other. For variable-length PKs (e.g., 
STRING), we store them as two sub-vectors, one vector is used for the offsets 
for each PK, whereas the other is used to store the PK value itself.

When bulk-loading a ColumnBTree, we rely on 
*AbstractColumnTupleWriter#bytesRequired(ITupleReference)* to get the estimated 
size to write the PKs of the tuple. For variable length values, we do not count 
the size of the offsets' sub-vector. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to