Hi jianfeng
      As far as I know, there may not be a solution in hudi side yet.
However, I have met this problem before so hope my experience could help.
Just like other usages of hbase, adding a random prefix to rowkey may be
the most universal solution to this problem.
We may change the primary key for hudi by adding such prefix before the
data is ingested into hudi. A new column could be added to save original
primary key for query and hide the pk of hudi.
Also, we may have a small modification to hbase index. Copy the code of
hbase index, add the prefix on the aspect of query and update hbase. By
this way, the pk in hbase will be different with the one in hudi but such
logic will be transparent to business logic. I have adopted this method in
prod environment. Using withIndexClass config in IndexConfig could specify
custom index which allows the change of index without re compilation of the
whole hudi project.

On Mon, Oct 4, 2021, 11:29 PM <[email protected]> wrote:
when I bootstrape a huge hbase index table, I found all keys have a prefix
'itemid:', then it caused data skew, there are 100 region servers in hbase
but only one was handle datas Is there any way to avoid this issue on the
Hudi side ? -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure

Reply via email to