Wail Y. Alkowaileet created ASTERIXDB-3212:
----------------------------------------------
Summary: Account for offset sizes for variable-length PKs in
columnar datasets
Key: ASTERIXDB-3212
URL: https://issues.apache.org/jira/browse/ASTERIXDB-3212
Project: Apache AsterixDB
Issue Type: Bug
Components: STO - Storage
Affects Versions: 0.9.9
Reporter: Wail Y. Alkowaileet
Assignee: Wail Y. Alkowaileet
Fix For: 0.9.9
After we moved from encoded PKs to plain PKs, fixed-length vs. variable-length
values are now stored differently. For fixed-length PKs (e.g., BIGINT), we
store each primary key one after the other. For variable-length PKs (e.g.,
STRING), we store them as two sub-vectors, one vector is used for the offsets
for each PK, whereas the other is used to store the PK value itself.
When bulk-loading a ColumnBTree, we rely on
*AbstractColumnTupleWriter#bytesRequired(ITupleReference)* to get the estimated
size to write the PKs of the tuple. For variable length values, we do not count
the size of the offsets' sub-vector.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)