[ https://issues.apache.org/jira/browse/PHOENIX-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani resolved PHOENIX-7357. ----------------------------------- Resolution: Fixed > New variable length binary data type: VARBINARY_ENCODED > ------------------------------------------------------- > > Key: PHOENIX-7357 > URL: https://issues.apache.org/jira/browse/PHOENIX-7357 > Project: Phoenix > Issue Type: New Feature > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Fix For: 5.3.0 > > > As of today, Phoenix provides several variable length as well as fixed length > data types. One of the variable length data types is VARBINARY. It is > variable length binary blob. Using VARBINARY as only primary key can be > considered as if using HBase row key. > HBase provides a single row key. Any client application that requires using > more than one column for primary keys, using HBase requires special handling > of storing both column values as a single binary row key. Phoenix provides > the ability to use more than one primary key by providing composite primary > keys. Composite primary key can contain any number of primary key columns. > Phoenix also provides the ability to add new nullable primary key columns to > the existing composite primary keys. Phoenix uses HBase as its backing store. > In order to provide the ability for users to define multiple primary keys, > Phoenix internally concatenates binary encoded values of each primary key > column value and uses concatenated binary value as HBase row key. In order to > efficiently concatenate as well as retrieve individual primary key values, > Phoenix implements two ways: > # For fixed length columns: The length of the given column is determined by > the maximum length of the column. As part of the read flow, while iterating > through the row key, fixed length numbers of bytes are retrieved while > reading. While writing, if the original encoded value of the given column has > less number of bytes, additional null bytes (\x00) are padded until the fixed > length is filled up. Hence, for smaller values, we end up wasting some space. > # For variable length columns: Since we cannot know the length of the value > of variable length data type in advance, a separator or terminator byte is > used. Phoenix uses null byte as separator (\x00) byte. As of today, VARCHAR > is the most commonly used variable length data type and since VARCHAR > represents String, null byte is not part of valid String characters. Hence, > it can be effectively used to determine when to terminate the given VARCHAR > value. > > The null byte (\x00) works fine as a separator for VARCHAR. However, it > cannot be used as a separator byte for VARBINARY because VARBINARY can > contain any binary blob values. Due to this, Phoenix has restrictions for > VARBINARY type: > > # It can only be used as the last part of the composite primary key. > # It cannot be used as a DESC order primary key column. > > Using VARBINARY data type as an earlier portion of the composite primary key > is a valid use case. One can also use multiple VARBINARY primary key columns. > After all, Phoenix provides the ability to use multiple primary key columns > for users. > Besides, using secondary index on data table means that the composite primary > key of secondary index table includes: > <secondary-index-col1> <secondary-index-col2> … <secondary-index-colN> > <primary-key-col1> <primary-key-col2> … <primary-key-colN> > > As primary key columns are appended to the secondary indexes columns, one > cannot create a secondary index on any VARBINARY column. > The proposal of this Jira is to introduce new data type > {*}VARBINARY_ENCODED{*}, which has no restriction of being considered as > composite primary key prefix or using it as DESC ordered column. > This means, we need to effectively distinguish where the variable length > binary data terminates in the absence of fixed length information. -- This message was sent by Atlassian Jira (v8.20.10#820010)