[
https://issues.apache.org/jira/browse/PHOENIX-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani resolved PHOENIX-7357.
-----------------------------------
Resolution: Fixed
> New variable length binary data type: VARBINARY_ENCODED
> -------------------------------------------------------
>
> Key: PHOENIX-7357
> URL: https://issues.apache.org/jira/browse/PHOENIX-7357
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Fix For: 5.3.0
>
>
> As of today, Phoenix provides several variable length as well as fixed length
> data types. One of the variable length data types is VARBINARY. It is
> variable length binary blob. Using VARBINARY as only primary key can be
> considered as if using HBase row key.
> HBase provides a single row key. Any client application that requires using
> more than one column for primary keys, using HBase requires special handling
> of storing both column values as a single binary row key. Phoenix provides
> the ability to use more than one primary key by providing composite primary
> keys. Composite primary key can contain any number of primary key columns.
> Phoenix also provides the ability to add new nullable primary key columns to
> the existing composite primary keys. Phoenix uses HBase as its backing store.
> In order to provide the ability for users to define multiple primary keys,
> Phoenix internally concatenates binary encoded values of each primary key
> column value and uses concatenated binary value as HBase row key. In order to
> efficiently concatenate as well as retrieve individual primary key values,
> Phoenix implements two ways:
> # For fixed length columns: The length of the given column is determined by
> the maximum length of the column. As part of the read flow, while iterating
> through the row key, fixed length numbers of bytes are retrieved while
> reading. While writing, if the original encoded value of the given column has
> less number of bytes, additional null bytes (\x00) are padded until the fixed
> length is filled up. Hence, for smaller values, we end up wasting some space.
> # For variable length columns: Since we cannot know the length of the value
> of variable length data type in advance, a separator or terminator byte is
> used. Phoenix uses null byte as separator (\x00) byte. As of today, VARCHAR
> is the most commonly used variable length data type and since VARCHAR
> represents String, null byte is not part of valid String characters. Hence,
> it can be effectively used to determine when to terminate the given VARCHAR
> value.
>
> The null byte (\x00) works fine as a separator for VARCHAR. However, it
> cannot be used as a separator byte for VARBINARY because VARBINARY can
> contain any binary blob values. Due to this, Phoenix has restrictions for
> VARBINARY type:
>
> # It can only be used as the last part of the composite primary key.
> # It cannot be used as a DESC order primary key column.
>
> Using VARBINARY data type as an earlier portion of the composite primary key
> is a valid use case. One can also use multiple VARBINARY primary key columns.
> After all, Phoenix provides the ability to use multiple primary key columns
> for users.
> Besides, using secondary index on data table means that the composite primary
> key of secondary index table includes:
> <secondary-index-col1> <secondary-index-col2> … <secondary-index-colN>
> <primary-key-col1> <primary-key-col2> … <primary-key-colN>
>
> As primary key columns are appended to the secondary indexes columns, one
> cannot create a secondary index on any VARBINARY column.
> The proposal of this Jira is to introduce new data type
> {*}VARBINARY_ENCODED{*}, which has no restriction of being considered as
> composite primary key prefix or using it as DESC ordered column.
> This means, we need to effectively distinguish where the variable length
> binary data terminates in the absence of fixed length information.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)