[ https://issues.apache.org/jira/browse/HUDI-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jian Feng updated HUDI-4210: ---------------------------- Status: Open (was: In Progress) > Create custom hbase index to solve data skew issue on hbase regions > ------------------------------------------------------------------- > > Key: HUDI-4210 > URL: https://issues.apache.org/jira/browse/HUDI-4210 > Project: Apache Hudi > Issue Type: Improvement > Components: index > Reporter: Jian Feng > Assignee: Jian Feng > Priority: Major > Labels: pull-request-available > > In our production environment, since many table's id is auto-increment, if > using Hbase index, will cause a data skew issue in HBase regions. it is > better to find a way to add random prefixes and also keep ordering in hudi > itself. > we may have a small modification to the HBase index. add the prefix on the > aspect of query and update HBase. In > this way, the pk in HBase will be different from the one in hudi but such > logic will be transparent to business logic. I have adopted this method in > prod environment. Using withIndexClass config in IndexConfig could specify > the custom index > > Another work, driven by uber engineers > [https://github.com/apache/hudi/pull/3508] could > technically solve the issue by directly reading HFiles, but still in > progress, this approach should resolve this issue immediately -- This message was sent by Atlassian Jira (v8.20.10#820010)