[
https://issues.apache.org/jira/browse/PHOENIX-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kadir Ozdemir updated PHOENIX-7710:
-----------------------------------
Summary: Supporting single cell storage format for mutable tables (was:
Extending single cell storage format for mutable tables)
> Supporting single cell storage format for mutable tables
> --------------------------------------------------------
>
> Key: PHOENIX-7710
> URL: https://issues.apache.org/jira/browse/PHOENIX-7710
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Kadir Ozdemir
> Priority: Major
>
> Phoenix uses two storage formats for immutable tables: single cell per column
> (ONE_CELL_PER_COLUMN) and single cell per family
> (SINGLE_CELL_ARRAY_WITH_OFFSETS). Packing all columns of a row (within a
> specific column family) into a single cell reduces storage, network, and
> memory usage, generally improving performance for many use cases.
> The single cell storage format is only supported for immutable tables.
> Extending it to mutable tables would have required Phoenix to read existing
> rows and merge them with new, potentially partial, mutations to generate full
> row mutations. While this might be acceptable for tables with covered indexes
> (as rows are read for generating index mutations), it would be costly for
> other tables.
> Phoenix has added more server side functionality by leveraging the HBase
> coprocessor architecture to optimize HBase better for Phoenix use cases. A
> recent such customization was done for HBase compaction. This was required
> for eliminating data integrity issues when TTL is configured. HBase TTL
> operates at the cell level and leads to partial row expiration. Partial row
> expiration may result in data loss in Phoenix. To fix this, Phoenix
> introduced a compaction scanner that preserves row integrity during TTL
> processing (see PHOENIX-6888).
> This Phoenix-level compaction can be leveraged to support the single cell
> format for mutable tables without requiring row reads during mutations. To
> achieve this, each mutation can be represented as a separate cell (per column
> family), with each cell (within a column family) having a different
> dynamically generated column qualifier. During flushes and compaction, these
> cells can be merged into a single cell. During scans, HBase region scanners
> will return all these cells (each with its own column qualifier), and Phoenix
> custom filters can merge them into one cell before applying filtering. These
> changes allow Phoenix to pack all columns of a column family for a given row
> into a single cell.
> The single cell storage format for mutable tables does not have to follow
> exactly the same implementation of that for immutable tables. For example,
> the empty column can also be packed together with other columns in the new
> format for mutable tables, which is not the case for the immutable format.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)