[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292062#comment-16292062 ]
stack commented on HBASE-18946: ------------------------------- Failure was TestMasterFailover. It failed to write its xml because it timed out twice. Looking that the test, it tries to set zk node states and move meta regions off master, neither of which makes sense in AMv2. I refactored the TestMasterFailover test that does nonesense. I see other timeouts though it looks like most other tests just pass. Here is what I see in console: TestDLSFSHLog TestStochasticLoadBalancer TestReplicationZKNodeCleaner TestLogsCleaner These all pass locally w/o issue EXCEPT TestDLSFSHLog. It looks sick, stuck. Digging, indeed, its the fault of this patch. We try to keep sending state change messages to master as long as we can only the thread is not daemon so it keeps the RS up. Ugh! Fixed. > Stochastic load balancer assigns replica regions to the same RS > --------------------------------------------------------------- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug > Affects Versions: 2.0.0-alpha-3 > Reporter: ramkrishna.s.vasudevan > Assignee: stack > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, > HBASE-18946.master.006.patch, HBASE-18946.master.007.patch, > HBASE-18946.master.008.patch, HBASE-18946.master.009.patch, > HBASE-18946.master.010.patch, HBASE-18946.master.011.patch, > HBASE-18946.patch, HBASE-18946.patch, HBASE-18946_2.patch, > HBASE-18946_2.patch, HBASE-18946_simple_7.patch, HBASE-18946_simple_8.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)