Kadir Ozdemir created PHOENIX-7710:
--------------------------------------
Summary: Extending single cell storage format for mutable tables
Key: PHOENIX-7710
URL: https://issues.apache.org/jira/browse/PHOENIX-7710
Project: Phoenix
Issue Type: Improvement
Reporter: Kadir Ozdemir
Phoenix uses two storage formats for immutable tables: single cell per column
(ONE_CELL_PER_COLUMN) and single cell per family
(SINGLE_CELL_ARRAY_WITH_OFFSETS). Packing all columns of a row (within a
specific column family) into a single cell reduces storage, network, and memory
usage, generally improving performance for many use cases.
The single cell storage format is only supported for immutable tables.
Extending it to mutable tables would have required Phoenix to read existing
rows and merge them with new, potentially partial, mutations to generate full
row mutations. While this might be acceptable for tables with covered indexes
(as rows are read for generating index mutations), it would be costly for other
tables.
Phoenix has added more server side functionality by leveraging the HBase
coprocessor architecture to optimize HBase better for Phoenix use cases. A
recent such customization was done for HBase compaction. This was required for
eliminating data integrity issues when TTL is configured. HBase TTL operates at
the cell level and leads to partial row expiration. Partial row expiration may
result in data loss in Phoenix. To fix this, Phoenix introduced a compaction
scanner that preserves row integrity during TTL processing (see PHOENIX-6888).
This Phoenix-level compaction can be leveraged to support the single cell
format for mutable tables without requiring row reads during mutations. To
achieve this, each mutation can be represented as a separate cell (per column
family), with each cell (within a column family) having a different dynamically
generated column qualifier. During flushes and compaction, these cells can be
merged into a single cell. During scans, HBase region scanners will return all
these cells (each with its own column qualifier), and Phoenix custom filters
can merge them into one cell before applying filtering. These changes allow
Phoenix to pack all columns of a column family for a given row into a single
cell.
The single cell storage format for mutable tables does not have to follow
exactly the same implementation of that for immutable tables. For example, the
empty column can also be packed together with other columns in the new format
for mutable tables, which is not the case for the immutable format.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)