Thank you Kadir! There is an obvious tradoff of write load vs the advantages you described, but this only applies to multi-CF native Phoenix tables (which are not that common in my experience). I think that the benefits outweigh the costs.
Istvan On Thu, Mar 13, 2025 at 3:36 AM Kadir Ozdemir <ka...@apache.org> wrote: > I created an improvement Jira (PHOENIX-7544) to add an empty column to all > column families. Let me know if you have any questions or concerns for this > Jira. I would like to include it in the next release. Here is the > description of it: > > In Phoenix, a mutation is a set of cells with the same row key and > timestamp corresponding to an update on the existing row or insertion of > the first image of the row made by an UPSERT statement. This mutation > always includes an empty column cell which is an internal column/cell not > visible to the user. This cell is used for multiple purposes. It allows a > table schema to include only PK columns. As PK columns are stored in the > row key, we need a cell to represent the row in HBase as a row in HBase is > a set of cells with the same row key and to form a cell we need a non-PK > column. > > The empty column does not hold any value for data tables but it includes > the row status for covered indexes. This row status indicates if the index > row is verified or unverified (or in other words, committed or > uncommitted). This is the second use of the empty column. The third use is > for finding out the last modification time (the row timestamp) for a row. > Instead of reading the cells for all non-PK columns of a row to find the > last modification time, we just need to read the empty column as the empty > column is always included in row mutations. > > Phoenix compaction, TTLRegionScanner (used for masking expired rows) and > PHOENIX_ROW_TIMESTAMP() rely on the empty column to determine the row > timestamp. In Phoenix, there is only one empty column for a given row. > This empty column is stored in one of the column families if the row has > multiple column families. This creates challenges for Phoenix compaction. > > HBase TTL and thus HBase major compaction expire cells individually. > Phoenix compaction is a customized HBase compaction to make sure that the > unit of expiration is row instead of cell in order to eliminate partial row > expiry (a data integrity issue in Phoenix). Compaction is done at the > column family level in HBase however Phoenix compaction needs to fetch all > cells from all column families for a given row to determine if the row > should expire or not. This requires region level scans during major > compaction in addition to the compaction scan done at the column family > (that is, store) level. This obviously requires scanning the same row > multiple times, one for each column family and causes performance issues. > > The reason that Phoenix major compaction needs to read cells from all > column families today is to see if the time gap between mutations on a > given row is more than the TTL value for the table. If so, the mutations > beyond such a gap should expire. This gap analysis will not be necessary if > every column family includes an empty column, requiring Phoenix to insert > an empty column cell for each of the column families to every mutation. > Please note that Phoenix does this for only one of the column families > today. > > This is the reason we propose including an empty column for each column > family in Phoenix. Clearly, this proposal requires inserting these empty > column cells not only for new mutations but also the existing mutations on > a table with multiple column families. This means this proposal requires > upgrading existing such tables. The suggested approach for this is to have > an upgrade process using Phoenix task (the SYSTEM.TASK table). For each > table to be upgraded, we can add a specific task to insert empty cells to > existing mutations. This task first disables major compaction on the table, > then adds the empty cells, and finally enables major compaction again. > > Please note that Phoenix compaction is an optional feature as it is > disabled by default. Although this proposal is required only for the tables > with multiple column families on which Phoenix compaction is enabled, we > suggest that including empty columns for all column families from now on > since it simplifies Phoenix by eliminating the logic to determine which > column family should include the empty column. Even though Phoenix > compaction is an optional feature, it is actually required for tables with > TTL since it prevents data integrity issues due to the cell level TTL > feature in HBase. > -- *István Tóth* | Sr. Staff Software Engineer *Email*: st...@cloudera.com cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------