[jira] [Resolved] (PHOENIX-7357) New variable length binary data type: VARBINARY_ENCODED

Viraj Jasani (Jira) Wed, 18 Sep 2024 13:05:08 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Viraj Jasani resolved PHOENIX-7357.
-----------------------------------
    Resolution: Fixed

> New variable length binary data type: VARBINARY_ENCODED
> -------------------------------------------------------
>
>                 Key: PHOENIX-7357
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7357
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>             Fix For: 5.3.0
>
>
> As of today, Phoenix provides several variable length as well as fixed length 
> data types. One of the variable length data types is VARBINARY. It is 
> variable length binary blob. Using VARBINARY as only primary key can be 
> considered as if using HBase row key.
> HBase provides a single row key. Any client application that requires using 
> more than one column for primary keys, using HBase requires special handling 
> of storing both column values as a single binary row key. Phoenix provides 
> the ability to use more than one primary key by providing composite primary 
> keys. Composite primary key can contain any number of primary key columns. 
> Phoenix also provides the ability to add new nullable primary key columns to 
> the existing composite primary keys. Phoenix uses HBase as its backing store. 
> In order to provide the ability for users to define multiple primary keys, 
> Phoenix internally concatenates binary encoded values of each primary key 
> column value and uses concatenated binary value as HBase row key. In order to 
> efficiently concatenate as well as retrieve individual primary key values, 
> Phoenix implements two ways:
>  # For fixed length columns: The length of the given column is determined by 
> the maximum length of the column. As part of the read flow, while iterating 
> through the row key, fixed length numbers of bytes are retrieved while 
> reading. While writing, if the original encoded value of the given column has 
> less number of bytes, additional null bytes (\x00) are padded until the fixed 
> length is filled up. Hence, for smaller values, we end up wasting some space.
>  # For variable length columns: Since we cannot know the length of the value 
> of variable length data type in advance, a separator or terminator byte is 
> used. Phoenix uses null byte as separator (\x00) byte. As of today, VARCHAR 
> is the most commonly used variable length data type and since VARCHAR 
> represents String, null byte is not part of valid String characters. Hence, 
> it can be effectively used to determine when to terminate the given VARCHAR 
> value.
>  
> The null byte (\x00) works fine as a separator for VARCHAR. However, it 
> cannot be used as a separator byte for VARBINARY because VARBINARY can 
> contain any binary blob values. Due to this, Phoenix has restrictions for 
> VARBINARY type: 
>  
>  # It can only be used as the last part of the composite primary key.
>  # It cannot be used as a DESC order primary key column.
>  
> Using VARBINARY data type as an earlier portion of the composite primary key 
> is a valid use case. One can also use multiple VARBINARY primary key columns. 
> After all, Phoenix provides the ability to use multiple primary key columns 
> for users.
> Besides, using secondary index on data table means that the composite primary 
> key of secondary index table includes: 
> <secondary-index-col1> <secondary-index-col2> … <secondary-index-colN> 
> <primary-key-col1> <primary-key-col2> … <primary-key-colN>
>  
> As primary key columns are appended to the secondary indexes columns, one 
> cannot create a secondary index on any VARBINARY column.
> The proposal of this Jira is to introduce new data type 
> {*}VARBINARY_ENCODED{*}, which has no restriction of being considered as 
> composite primary key prefix or using it as DESC ordered column.
> This means, we need to effectively distinguish where the variable length 
> binary data terminates in the absence of fixed length information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (PHOENIX-7357) New variable length binary data type: VARBINARY_ENCODED

Reply via email to