[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-27 Thread chenxu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064378#comment-16064378
 ] 

chenxu commented on HBASE-18215:


upload a patch for #4, [~toffer]

> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Umbrella
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-21 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058100#comment-16058100
 ] 

Francis Liu commented on HBASE-18215:
-

{quote}
but if balancer.randomAssignment return BOGUS_SERVER_NAME, AM can’t handle this.
an OPEN RPC will be sent to the BOGUS_SERVER_NAME, this should not happen
{quote}
Balancer is doing the right thing here return BOGUS_SERVER_NAME when a plan is 
unavailable. In order to make RSGroup patch not too invasive in core code there 
will be cases that the Master may end up trying to assign it to 
BOGUS_SERVER_NAME. Looks like you are running into HBASE-18235.

> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-20 Thread chenxu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055203#comment-16055203
 ] 

chenxu commented on HBASE-18215:


bq. #4 Good catch. Would you like to submit a separate patch for this?

i can do it recently!

> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-20 Thread chenxu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055201#comment-16055201
 ] 

chenxu commented on HBASE-18215:


bq. Can you provide a specific scenario?

there is a scenario

{code:title=RSGroupBasedLoadBalancer.java|borderStyle=solid}
public List balanceCluster(Map 
clusterState)
throws HBaseIOException {
...
List misplacedRegions = 
correctedState.get(LoadBalancer.BOGUS_SERVER_NAME);
for (HRegionInfo regionInfo : misplacedRegions) {
  regionPlans.add(new RegionPlan(regionInfo, null, null));
}
   ...
{code}
if region misplaced, RegionPlan’s dest is null, and the following will happen
{code:title=AssignmentManager.java|borderStyle=solid}
private RegionPlan getRegionPlan(final HRegionInfo region,
final boolean forceNewPlan) throws HBaseIOException {
...
   if (forceNewPlan
  || existingPlan == null
  || existingPlan.getDestination() == null
  || !destServers.contains(existingPlan.getDestination())) {
newPlan = true;
try {
  randomPlan = new RegionPlan(region, null,
  balancer.randomAssignment(region, destServers));
} catch (IOException ex) {
  LOG.warn("Failed to create new plan.",ex);
  return null;
}
this.regionPlans.put(encodedName, randomPlan);
  }
}
if (newPlan) {
  if (randomPlan.getDestination() == null) {
LOG.warn("Can't find a destination for " + encodedName);
return null;
  }
  ...
  return randomPlan;
}
...
}
{code}
if balancer.randomAssignment return null, getRegionPlan will null, AM will 
handle like this
{code:title=AssignmentManager.java|borderStyle=solid}
private void assign(RegionState state, boolean forceNewPlan) {
...
if (plan == null) {
  LOG.warn("Unable to determine a plan to assign " + region);
  // For meta region, we have to keep retrying until succeeding
  if (region.isMetaRegion()) {

  }
  regionStates.updateRegionState(region, State.FAILED_OPEN);
  return;
}

}
{code}
the target region will be transition to FAILED_OPEN status.
but if balancer.randomAssignment return BOGUS_SERVER_NAME, AM can’t handle this.
an OPEN RPC will be sent to the BOGUS_SERVER_NAME, this should not happen

> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-19 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054517#comment-16054517
 ] 

Francis Liu commented on HBASE-18215:
-

#5 might be because of HBASE-18235

> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-19 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054515#comment-16054515
 ] 

Francis Liu commented on HBASE-18215:
-

{quote}
Was thinking that having webserver last, it would have all dependencies in 
CLASSPATH; could do stuff too like pick up pages from well-known locations and 
serve them up (Main point is +1 on resgroup info in the web ui... how we do it, 
needs a bit of thought/work).
{quote}
Sounds good. Your approach sounds a lot cleaner as well. 

> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054434#comment-16054434
 ] 

stack commented on HBASE-18215:
---

bq. Putting it in HDFS was something we thought about before but one reviewer 
did not like it and suggested we use a table. That was sometime ago will have 
to dig it up.

The justification you give above [~toffer] -- "...but you are pushing 
complexity to the user in the way of managing the file: persistence in case of 
failure, dealing with concurrent updates, etc. .." -- is pretty good I'd say.

bq. Won't this still leak rsgroup balancer into core code? We could create the 
same jamon file but with rsgroup info in hbase-rsgroup and overwrite the 
hbase-server's somehow.

Was thinking that having webserver last, it would have all dependencies in 
CLASSPATH; could do stuff too like pick up pages from well-known locations and 
serve them up (Main point is +1 on resgroup info in the web ui... how we do it, 
needs a bit of thought/work).

> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-19 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054374#comment-16054374
 ] 

Francis Liu commented on HBASE-18215:
-

{quote}
Regards #2, you mean a file in HDFS rather than local? Sounds good.
{quote}
Putting it in HDFS was something we thought about before but one reviewer did 
not like it and suggested we use a table. That was sometime ago will have to 
dig it up.

{quote}
On #3, lets figure how we can avoid moving the rsgroup balancer into 
hbase-server. Maybe we make a hbase-webserver module and put it after 
hbase-rsgroup?
{quote}
Won't this still leak rsgroup balancer into core code? We could create the same 
jamon file but with rsgroup info in hbase-rsgroup and overwrite the 
hbase-server's somehow. 

> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-19 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054334#comment-16054334
 ] 

Francis Liu commented on HBASE-18215:
-

#1 You don't really need to put tables on master anymore just create another 
regionserver group to put the tables on. This makes meta much more available 
and allowing you to restart the master when needed without causing impact. 
Adding the master as part of an rsgroup may lead to operational suprises. I'd 
recommend let the master just do master responsibilities.

#2 Quick look at the patch and it looks like it is indeed a local file. It will 
make rsgroup code simpler but you are pushing complexity to the user in the way 
of managing the file: persistence in case of failure, dealing with concurrent 
updates, etc. Having apis aren't that complex and are much more user-friendly 
and possibly more flexible. 

#3 Yes we should. Getting the rsgroup patch in took herculean effort hence I 
focused only on the essentials. As Stack mentioned we need a way such that we 
don't cause unacceptable leaking of rsgroup into core code.

#4 Good catch. Would you like to submit a separate patch for this?

#5 Can you provide a specific scenario? When the rsgroup patch was written if I 
remember correctly the reverse was true. AM cannot handle null results when 
calling randomAssignment().

#6 Sounds reasonable.   



> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052213#comment-16052213
 ] 

stack commented on HBASE-18215:
---

Thanks [~javaman_chen]

[~toffer] FYI

Regards #2, you mean a file in HDFS rather than local? Sounds good.

On #3, lets figure how we can avoid moving the rsgroup balancer into 
hbase-server. Maybe we make a hbase-webserver module and put it after 
hbase-rsgroup?

This is great stuff.




> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18215) some advises about refactoring of rsgroup

2017-06-16 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052129#comment-16052129
 ] 

Guanghao Zhang commented on HBASE-18215:


You can update this issue type to a umbrella issue and break this one to some 
sub-tasks. That will help to review. And the rsgroup feature are merged to 
hbase 2.0 or bigger version. So please upload patch for hbase 2.0 and master 
branch. Thanks.

> some advises about refactoring of rsgroup
> -
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)