[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8374:
--------------------------

    Attachment: 8374-trunk-v4.txt

w.r.t. Nicolas' question, I think what happened was that serversToIndex map 
wasn't fully populated because one loop was used to iterate through 
clusterState.entrySet().

In patch v4, I introduced another loop to populate serversToIndex map.

I kept the checking from patch v3 in case regionFinder.getTopBlockLocations() 
returns some ServerName which is not in serversToIndex map.
                
> NPE when launching the balance
> ------------------------------
>
>                 Key: HBASE-8374
>                 URL: https://issues.apache.org/jira/browse/HBASE-8374
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer
>    Affects Versions: 0.95.0
>         Environment: AWS / real cluster with 3 nodes + master
>            Reporter: Nicolas Liochon
>            Assignee: Ted Yu
>             Fix For: 0.98.0, 0.95.1
>
>         Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt, 
> 8374-trunk-v4.txt
>
>
> I don't reproduce this all the time, but I had it on a fairly clean env.
> It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
> balancer does not run. When it starts to occurs, it occurs all the time. I 
> haven't tried to restart the master, but I think it should be enough.
> Now, looking at the code, the NPE is strange. 
> {noformat}
> 2013-04-18 08:09:52,079 ERROR [box,60000,1366281581983-BalancerChore] 
> org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:145)
>       at 
> org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
>       at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
>       at 
> org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
>       at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
>       at java.lang.Thread.run(Thread.java:662)
> 2013-04-18 08:09:52,103 DEBUG [box,60000,1366281581983-CatalogJanitor] 
> org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
> starting at key ''
> {noformat}
> {code}
>           if (regionFinder != null) {
>             //region location
>             List<ServerName> loc = regionFinder.getTopBlockLocations(region);
>             regionLocations[regionIndex] = new int[loc.size()];
>             for (int i=0; i < loc.size(); i++) {
>               regionLocations[regionIndex][i] = 
> serversToIndex.get(loc.get(i));  // <========= NPE here
>             }
>           }
> {code}
> pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to