Kadir Ozdemir created PHOENIX-7710:
--------------------------------------

             Summary: Extending single cell storage format for mutable tables
                 Key: PHOENIX-7710
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7710
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Kadir Ozdemir


Phoenix uses two storage formats for immutable tables: single cell per column 
(ONE_CELL_PER_COLUMN) and single cell per family 
(SINGLE_CELL_ARRAY_WITH_OFFSETS). Packing all columns of a row (within a 
specific column family) into a single cell reduces storage, network, and memory 
usage, generally improving performance for many use cases.

The single cell storage format is only supported for immutable tables. 
Extending it to mutable tables would have required Phoenix to read existing 
rows and merge them with new, potentially partial, mutations to generate full 
row mutations. While this might be acceptable for tables with covered indexes 
(as rows are read for generating index mutations), it would be costly for other 
tables.

Phoenix has added more server side functionality by leveraging the HBase 
coprocessor architecture to optimize HBase better for Phoenix use cases. A 
recent such customization was done for HBase compaction. This was required for 
eliminating data integrity issues when TTL is configured. HBase TTL operates at 
the cell level and leads to partial row expiration. Partial row expiration may 
result in data loss in Phoenix. To fix this, Phoenix introduced a compaction 
scanner that preserves row integrity during TTL processing (see PHOENIX-6888). 

This Phoenix-level compaction can be leveraged to support the single cell 
format for mutable tables without requiring row reads during mutations. To 
achieve this, each mutation can be represented as a separate cell (per column 
family), with each cell (within a column family) having a different dynamically 
generated column qualifier. During flushes and compaction, these cells can be 
merged into a single cell. During scans, HBase region scanners will return all 
these cells (each with its own column qualifier), and Phoenix custom filters 
can merge them into one cell before applying filtering. These changes allow 
Phoenix to pack all columns of a column family for a given row into a single 
cell.

The single cell storage format for mutable tables does not have to follow 
exactly the same implementation of that for immutable tables. For example, the 
empty column can also be packed together with other columns in the new format 
for mutable tables, which is not the case for the immutable format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to