[
https://issues.apache.org/jira/browse/PHOENIX-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18045020#comment-18045020
]
Kadir Ozdemir commented on PHOENIX-7710:
----------------------------------------
With this Jira we can also improve dynamic column handling and Phoenix
compaction. Again the single cell storage format for mutable tables can be
different from the single cell storage format for immutable tables.
Currently dynamic columns are not supported for the single cell storage format
(see PHOENIX-5107). Dynamic columns can be packed with static columns without
using shadow cells but these dynamic columns can still support wildcard queries
in the the single cell storage format for mutable tables.
Currently empty column cells are not packed into the single cell storage format
and Phoenix continues to have separate cells for empty column cells. A separate
empty column cell is not needed for the single cell storage format and the
empty column value for a row can be encoded in the single cell for each column
family. This simplifies the Phoenix compaction since there will be no need to
have region level compaction when the single storage format is used for mutable
tables and the empty column value is written to all column families for every
mutation. This will also simplify the TTL masking logic during scans. There
will be no need for extra scans to identify expired mutations. The filtering
process would identify and filter out expired mutations.
> Supporting single cell storage format for mutable tables
> --------------------------------------------------------
>
> Key: PHOENIX-7710
> URL: https://issues.apache.org/jira/browse/PHOENIX-7710
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Kadir Ozdemir
> Priority: Major
>
> Phoenix uses two storage formats for immutable tables: single cell per column
> (ONE_CELL_PER_COLUMN) and single cell per family
> (SINGLE_CELL_ARRAY_WITH_OFFSETS). Packing all columns of a row (within a
> specific column family) into a single cell reduces storage, network, and
> memory usage, generally improving performance for many use cases.
> The single cell storage format is only supported for immutable tables.
> Extending it to mutable tables would have required Phoenix to read existing
> rows and merge them with new, potentially partial, mutations to generate full
> row mutations. While this might be acceptable for tables with covered indexes
> (as rows are read for generating index mutations), it would be costly for
> other tables.
> Phoenix has added more server side functionality by leveraging the HBase
> coprocessor architecture to optimize HBase better for Phoenix use cases. A
> recent such customization was done for HBase compaction. This was required
> for eliminating data integrity issues when TTL is configured. HBase TTL
> operates at the cell level and leads to partial row expiration. Partial row
> expiration may result in data loss in Phoenix. To fix this, Phoenix
> introduced a compaction scanner that preserves row integrity during TTL
> processing (see PHOENIX-6888).
> This Phoenix-level compaction can be leveraged to support the single cell
> format for mutable tables without requiring row reads during mutations. To
> achieve this, each mutation can be represented as a separate cell (per column
> family), with each cell (within a column family) having a different
> dynamically generated column qualifier. During flushes and compaction, these
> cells can be merged into a single cell. During scans, HBase region scanners
> will return all these cells (each with its own column qualifier), and Phoenix
> custom filters can merge them into one cell before applying filtering. These
> changes allow Phoenix to pack all columns of a column family for a given row
> into a single cell.
> The single cell storage format for mutable tables does not have to follow
> exactly the same implementation of that for immutable tables. For example,
> the empty column can also be packed together with other columns in the new
> format for mutable tables, which is not the case for the immutable format.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)