Kadir Ozdemir created PHOENIX-7544:
--------------------------------------
Summary: Add empty column to all column families
Key: PHOENIX-7544
URL: https://issues.apache.org/jira/browse/PHOENIX-7544
Project: Phoenix
Issue Type: Improvement
Reporter: Kadir Ozdemir
In Phoenix, a mutation is a set of cells with the same row key and timestamp
corresponding to an update on the existing row or insertion of the first image
of the row made by an UPSERT statement. This mutation always includes an empty
column cell which is an internal column/cell not visible to the user. This cell
is used for multiple purposes. It allows a table schema to include only PK
columns. As PK columns are stored in the row key, we need a cell to represent
the row in HBase as a row in HBase is a set of cells with the same row key and
to form a cell we need a non-PK column.
The empty column does not hold any value for data tables but it includes the
row status for covered indexes. This row status indicates if the index row is
verified or unverified (or in other words, committed or uncommitted). This is
the second use of the empty column. The third use is for finding out the last
modification time (the row timestamp) for a row. Instead of reading the cells
for all non-PK columns of a row to find the last modification time, we just
need to read the empty column as the empty column is always included in row
mutations.
Phoenix compaction, TTLRegionScanner (used for masking expired rows) and
PHOENIX_ROW_TIMESTAMP() rely on the empty column to determine the row
timestamp. In Phoenix, there is only one empty column for a given row. This
empty column is stored in one of the column families if the row has multiple
column families. This creates challenges for Phoenix compaction.
HBase TTL and thus HBase major compaction expire cells individually. Phoenix
compaction is a customized HBase compaction to make sure that the unit of
expiration is row instead of cell in order to eliminate partial row expiry (a
data integrity issue in Phoenix). Compaction is done at the column family level
in HBase however Phoenix compaction needs to fetch all cells from all column
families for a given row to determine if the row should expire or not. This
requires region level scans during major compaction in addition to the
compaction scan done at the column family (that is, store) level. This
obviously requires scanning the same row multiple times, one for each column
family and causes performance issues.
The reason that Phoenix major compaction needs to read cells from all column
families today is to see if the time gap between mutations on a given row is
more than the TTL value for the table. If so, the mutations beyond such a gap
should expire. This gap analysis will not be necessary if every column family
includes an empty column, requiring Phoenix to insert an empty column cell for
each of the column families to every mutation. Please note that Phoenix does
this for only one of the column families today.
This is the reason we propose including an empty column for each column family
in Phoenix. Clearly, this proposal requires inserting these empty column cells
not only for new mutations but also the existing mutations on a table with
multiple column families. This means this proposal requires upgrading existing
such tables. The suggested approach for this is to have an upgrade process
using Phoenix task (the SYSTEM.TASK table). For each table to be upgraded, we
can add a specific task to insert empty cells to existing mutations. This task
first disables major compaction on the table, then adds the empty cells, and
finally enables major compaction again.
Please note that Phoenix compaction is an optional feature as it is disabled by
default. Although this proposal is required only for the tables with multiple
column families on which Phoenix compaction is enabled, we suggest that
including empty columns for all column families from now on since it simplifies
Phoenix by eliminating the logic to determine which column family should
include the empty column. Even though Phoenix compaction is an optional
feature, it is actually required for tables with TTL since it prevents data
integrity issues due to the cell level TTL feature in HBase.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)