[ https://issues.apache.org/jira/browse/HBASE-21439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677255#comment-16677255 ]
Ben Lau commented on HBASE-21439: --------------------------------- Hi [~stack] which mistake -- using different String conversions to get/put a region in a map, or using Bytes.toString() for a byte array that may not be the equivalent of some encoded UTF8 string? For mistake #1, I’m not aware of any other similar bugs in the codebase though it’s possible. I think we make mistake #2 in other parts of the code base particularly for printing debug messages for start/end keys of regions. Depending on how exotic your rowkey-space is (how far it is from the UTF8 plane), you could run into an issue. By 'issue,' I mean that parts of the start/end key will be silently dropped during decoding and replaced with new characters to indicate malformed input. It would be a bit misleading or strange but it would not crash. I can create a Jira ticket to audit the Bytes.toString() calls (there are many) but don’t have bandwidth to look at it unfortunately. > StochasticLoadBalancer RegionLoads aren’t being used in RegionLoad cost > functions > --------------------------------------------------------------------------------- > > Key: HBASE-21439 > URL: https://issues.apache.org/jira/browse/HBASE-21439 > Project: HBase > Issue Type: Bug > Components: Balancer > Affects Versions: 1.3.2.1, 2.0.2 > Reporter: Ben Lau > Assignee: Ben Lau > Priority: Major > > In StochasticLoadBalancer.updateRegionLoad() the region loads are being put > into the map with Bytes.toString(regionName). > First, this is a problem because Bytes.toString() assumes that the byte array > is a UTF8 encoded String but there is no guarantee that regionName bytes are > legal UTF8. > Secondly, in BaseLoadBalancer.registerRegion, we are reading the region loads > out of the load map not using Bytes.toString() but using > region.getRegionNameAsString() and region.getEncodedName(). So the load > balancer will not see or use any of the cluster's RegionLoad history. > There are 2 primary ways to solve this issue, assuming we want to stay with > String keys for the load map (seems reasonable to aid debugging). We can > either fix updateRegionLoad to store the regionName as a string properly or > we can update both the reader & writer to use a new common valid String > representation. > Will post a patch assuming we want to pursue the original intention, i.e. > store regionNameAsAString for the loadmap key, but I'm open to fixing this a > different way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)