[ https://issues.apache.org/jira/browse/HBASE-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258740#comment-14258740 ]
Weichen Ye commented on HBASE-12716: ------------------------------------ Hi, [~te...@apache.org] The code for the bug is in the function iterateOnSplits() in org.apache.hadoop.hbase.util.Bytes in hbase-common. The original code is if(diffBI.compareTo(splitsBI) < 0) { return null; } Here in the code , the "diffBI" is the diff bewteen start key and end key. "splitBI" is the number of pieces in the result. For example, if we want to split the region ["aaa","aab"] into 2 pieces, the "diffBI" for "aaa" and "aab" is 1, the "splitBI" is 2. Because diffBI < splitBI, here this function returns null. This is the reason for the NullPointerException. I just upload a new patch to fix this bug. Use an additional byte to find the split point between start key and end key. Would you please take a look at this patch? > A bug in RegionSplitter.UniformSplit algorithm > ---------------------------------------------- > > Key: HBASE-12716 > URL: https://issues.apache.org/jira/browse/HBASE-12716 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.98.6 > Reporter: Weichen Ye > Assignee: Weichen Ye > Attachments: HBASE-12716.patch > > > I`m working for another issues HBASE-12590 and trying to use the UniformSplit > algorithm in RegionSplitter. When the last bytes of start key and end key are > adjacent in alphabetical order or ASCII order, the UniformSplit algorithm > meet an NPE. > Like startkey: aaa, endkey :aab > startkey:1111 endkey: 1112 > For example, we write this simple test code: > {code} > import org.apache.hadoop.hbase.util.RegionSplitter.UniformSplit; > ...... > byte[] a1 = { 'a', 'a', 'a' }; > byte[] a2 = { 'a', 'a', 'b' }; > UniformSplit us = new UniformSplit(); > byte[] mid = us.split(a1, a2); > ...... > {code} > We will get the ERROR: > {code} > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.hbase.util.RegionSplitter$UniformSplit.split(RegionSplitter.java:986) > {code} > We hope this algorithm should be able to calculate the split point with an > additional byte. for example: > "aaa" and "aab", split point= "aaaP" > "1111" and "1112", split point ="1111P" -- This message was sent by Atlassian JIRA (v6.3.4#6332)