[ 
https://issues.apache.org/jira/browse/HBASE-21439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677255#comment-16677255
 ] 

Ben Lau commented on HBASE-21439:
---------------------------------

Hi [~stack] which mistake -- using different String conversions to get/put a 
region in a map, or using Bytes.toString() for a byte array that may not be the 
equivalent of some encoded UTF8 string?

For mistake #1, I’m not aware of any other similar bugs in the codebase though 
it’s possible.
I think we make mistake #2 in other parts of the code base particularly for 
printing debug messages for start/end keys of regions.
Depending on how exotic your rowkey-space is (how far it is from the UTF8 
plane), you could run into an issue.
By 'issue,' I mean that parts of the start/end key will be silently dropped 
during decoding and replaced with new characters to indicate malformed input.  
It would be a bit misleading or strange but it would not crash.
I can create a Jira ticket to audit the Bytes.toString() calls (there are many) 
but don’t have bandwidth to look at it unfortunately.


> StochasticLoadBalancer RegionLoads aren’t being used in RegionLoad cost 
> functions
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-21439
>                 URL: https://issues.apache.org/jira/browse/HBASE-21439
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer
>    Affects Versions: 1.3.2.1, 2.0.2
>            Reporter: Ben Lau
>            Assignee: Ben Lau
>            Priority: Major
>
> In StochasticLoadBalancer.updateRegionLoad() the region loads are being put 
> into the map with Bytes.toString(regionName).
> First, this is a problem because Bytes.toString() assumes that the byte array 
> is a UTF8 encoded String but there is no guarantee that regionName bytes are 
> legal UTF8.
> Secondly, in BaseLoadBalancer.registerRegion, we are reading the region loads 
> out of the load map not using Bytes.toString() but using 
> region.getRegionNameAsString() and region.getEncodedName().  So the load 
> balancer will not see or use any of the cluster's RegionLoad history.
> There are 2 primary ways to solve this issue, assuming we want to stay with 
> String keys for the load map (seems reasonable to aid debugging).  We can 
> either fix updateRegionLoad to store the regionName as a string properly or 
> we can update both the reader & writer to use a new common valid String 
> representation.
> Will post a patch assuming we want to pursue the original intention, i.e. 
> store regionNameAsAString for the loadmap key, but I'm open to fixing this a 
> different way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to