[jira] [Updated] (HUDI-4210) Create custom hbase index to solve data skew issue on hbase regions

Jian Feng (Jira) Mon, 01 Aug 2022 02:10:05 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jian Feng updated HUDI-4210:
----------------------------
    Status: Open  (was: In Progress)

> Create custom hbase index to solve data skew issue on hbase regions
> -------------------------------------------------------------------
>
>                 Key: HUDI-4210
>                 URL: https://issues.apache.org/jira/browse/HUDI-4210
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: index
>            Reporter: Jian Feng
>            Assignee: Jian Feng
>            Priority: Major
>              Labels: pull-request-available
>
> In our production environment, since many table's id is auto-increment， if 
> using Hbase index, will cause a data skew issue in HBase regions. it is 
> better to find a way to add random prefixes and also keep ordering in hudi 
> itself.
> we may have a small modification to the HBase index. add the prefix on the 
> aspect of query and update HBase. In
> this way, the pk in HBase will be different from the one in hudi but such
> logic will be transparent to business logic. I have adopted this method in
> prod environment. Using withIndexClass config in IndexConfig could specify
> the custom index 
>  
> Another work, driven by uber engineers 
> [https://github.com/apache/hudi/pull/3508] could
> technically solve the issue by directly reading HFiles, but still in 
> progress, this approach should resolve this issue immediately



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4210) Create custom hbase index to solve data skew issue on hbase regions

Reply via email to