[jira] [Commented] (HBASE-19501) [AMv2] Retain assignment across restarts

2017-12-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290337#comment-16290337
 ] 

stack commented on HBASE-19501:
---

Please do not review!

I'm breaking up this patch adding pieces to other issues and will close out 
this one. The summary is wrong. We had a mechanism for retaining assignment but 
its operation was cryptic and older versions of HBASE-18946 patch frustrated 
our retaining old config. Messing with this issue and fixing HBASE-18946 gave 
me a better understanding of how this all should work. Doc and some fixes from 
here went to HBASE-18946. Test fixing and new facility in HTU for testing 
retention will be added to the parent issue here.

On the items raised in the description:

 # It is hard to test if we retain assignments because our little minicluster 
gives RegionServers new ports on restart foiling our means of recognizing new 
instance of a server by checking hostname+port (and ensuring the startcode is 
larger).

This is so. There is a crazy test in 
TestRestartCluster#testRetainAssignmentOnRestart that records old RS port 
numbers then starts each of the daemons one-by-one setting port individually. 
In parent, I add means of doing this to HTU

 # Some of our tests like the parent test depended on retaining assignment 
across restarts.

They do. Retention doesn't work unless you do crazy stuff like the trick above 
in TestRestartCluster#testRetainAssignmentOnRestart (now a hack in HTU makes it 
a little easier to do).

 # As said in parent issue, master used to be last to go down when we did a 
controlled cluster shutdown. We lost that when we moved to AMv2.
When we do a cluster shutdown, the RegionServers close down the Regions, not 
the Master as is usual in AMv2 (Master wants to do all assign ops in AMv2). 
This means that the Master is surprised when it gets notification of CLOSE ops 
that it did not initiate. Usually on CLOSE, Master updates meta with the CLOSE 
state. On cluster shutdown we are not doing this.

Fixed this over in HBASE-18946 by keeping Master up till last so at least the 
noisy failed deliveries don't show in logs anymore. Also doc. the shutdown 
process. It can be improved.

 # So, on restart, we read meta and we see all regions still in OPEN state so 
we think the cluster crashed down so we go and do ServerCrashProcedure. Which 
hoses our ability to retain assign.

This is mostly true. Over in HBASE-18946 we explain whats going on and why we 
ALWAYS run ServerCrashProcedure just-in-case and we add distinction between 
creating assigns that retain old location and assigns that want to be 
distributed (i.e. new table creation).

Anyway, closing this out as won't fix or rather, the issues raised here are 
addressed elsewhere.

> [AMv2] Retain assignment across restarts
> 
>
> Key: HBASE-19501
> URL: https://issues.apache.org/jira/browse/HBASE-19501
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19501.master.001.patch, 
> HBASE-19501.master.002.patch, HBASE-19501.master.003.patch, HBASE-19501.patch
>
>
> Working with replicas and the parent test in particular, I learned a few 
> interesting things:
>  # It is hard to test if we retain assignments because our little minicluster 
> gives RegionServers new ports on restart foiling our means of recognizing new 
> instance of a server by checking hostname+port (and ensuring the startcode is 
> larger).
>  # Some of our tests like the parent test depended on retaining assignment 
> across restarts.
>  # As said in parent issue, master used to be last to go down when we did a 
> controlled cluster shutdown. We lost that when we moved to AMv2.
>  # When we do a cluster shutdown, the RegionServers close down the Regions, 
> not the Master as is usual in AMv2 (Master wants to do all assign ops in 
> AMv2). This means that the Master is surprised when it gets notification of 
> CLOSE ops that it did not initiate. Usually on CLOSE, Master updates meta 
> with the CLOSE state. On cluster shutdown we are not doing this.
>  # So, on restart, we read meta and we see all regions still in OPEN state so 
> we think the cluster crashed down so we go and do ServerCrashProcedure. Which 
> hoses our ability to retain assign.
> Some experiments:
>  # I can make the Master stay up so it is last to go down
>  # This makes it so we no longer spew the logs with failed transition 
> messages because Master is not up to receive the CLOSE transitions.
>  # I hacked in means of telling minicluster ports it should use on start; 
> helps fake case of new RS instances
>  # It is hard to tell the difference between a clean shutdown and a crash 
> down. It is dangerous if we get the call wrong. 

[jira] [Commented] (HBASE-19501) [AMv2] Retain assignment across restarts

2017-12-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290307#comment-16290307
 ] 

Hadoop QA commented on HBASE-19501:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
54s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
15s{color} | {color:red} hbase-server: The patch generated 16 new + 864 
unchanged - 9 fixed = 880 total (was 873) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
23s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 49s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0-beta1. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
59s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 94m  
3s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}137m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19501 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12901979/HBASE-19501.master.003.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 5915917273e9 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 104afd74a6 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 

[jira] [Commented] (HBASE-19501) [AMv2] Retain assignment across restarts

2017-12-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290216#comment-16290216
 ] 

stack commented on HBASE-19501:
---

.002 fixes tests. Again it is on top of HBASE-18946. Let me do write up on what 
is in here. The subject is actually misleading now; this patch partially is 
about it.  Will fix soon.

> [AMv2] Retain assignment across restarts
> 
>
> Key: HBASE-19501
> URL: https://issues.apache.org/jira/browse/HBASE-19501
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19501.master.001.patch, 
> HBASE-19501.master.002.patch, HBASE-19501.master.003.patch, HBASE-19501.patch
>
>
> Working with replicas and the parent test in particular, I learned a few 
> interesting things:
>  # It is hard to test if we retain assignments because our little minicluster 
> gives RegionServers new ports on restart foiling our means of recognizing new 
> instance of a server by checking hostname+port (and ensuring the startcode is 
> larger).
>  # Some of our tests like the parent test depended on retaining assignment 
> across restarts.
>  # As said in parent issue, master used to be last to go down when we did a 
> controlled cluster shutdown. We lost that when we moved to AMv2.
>  # When we do a cluster shutdown, the RegionServers close down the Regions, 
> not the Master as is usual in AMv2 (Master wants to do all assign ops in 
> AMv2). This means that the Master is surprised when it gets notification of 
> CLOSE ops that it did not initiate. Usually on CLOSE, Master updates meta 
> with the CLOSE state. On cluster shutdown we are not doing this.
>  # So, on restart, we read meta and we see all regions still in OPEN state so 
> we think the cluster crashed down so we go and do ServerCrashProcedure. Which 
> hoses our ability to retain assign.
> Some experiments:
>  # I can make the Master stay up so it is last to go down
>  # This makes it so we no longer spew the logs with failed transition 
> messages because Master is not up to receive the CLOSE transitions.
>  # I hacked in means of telling minicluster ports it should use on start; 
> helps fake case of new RS instances
>  # It is hard to tell the difference between a clean shutdown and a crash 
> down. It is dangerous if we get the call wrong. Currently, given that we just 
> let ServerCrashProcedure deal with it -- the safest option -- one experiment 
> is that when it goes to assign the regions that were on the crashed server, 
> rather than round robin, instead we should look and see if new instance of 
> old location and if so, just give it al lthe regions. That'd retain locality. 
> This seems to work. Problem is that SCP is doing assignment. Ideally balancer 
> would do it.
> Let me put up a patch that retains assignment across restart (somehow).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19501) [AMv2] Retain assignment across restarts

2017-12-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290068#comment-16290068
 ] 

Hadoop QA commented on HBASE-19501:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
7s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
53s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
16s{color} | {color:red} hbase-server: The patch generated 15 new + 783 
unchanged - 9 fixed = 798 total (was 792) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
33s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 57s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0-beta1. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
59s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m  7s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.master.TestMasterShutdown |
|   | hadoop.hbase.regionserver.TestRegionReplicasAreDistributed |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19501 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12901942/HBASE-19501.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 9a050c6b8fbe 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HBASE-19501) [AMv2] Retain assignment across restarts

2017-12-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289866#comment-16289866
 ] 

stack commented on HBASE-19501:
---

.001 is HBASE-19501 and HBASE-18946 squashed together so I can get an hadoopqa 
run in. Will describe content in next post.

> [AMv2] Retain assignment across restarts
> 
>
> Key: HBASE-19501
> URL: https://issues.apache.org/jira/browse/HBASE-19501
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19501.master.001.patch, HBASE-19501.patch
>
>
> Working with replicas and the parent test in particular, I learned a few 
> interesting things:
>  # It is hard to test if we retain assignments because our little minicluster 
> gives RegionServers new ports on restart foiling our means of recognizing new 
> instance of a server by checking hostname+port (and ensuring the startcode is 
> larger).
>  # Some of our tests like the parent test depended on retaining assignment 
> across restarts.
>  # As said in parent issue, master used to be last to go down when we did a 
> controlled cluster shutdown. We lost that when we moved to AMv2.
>  # When we do a cluster shutdown, the RegionServers close down the Regions, 
> not the Master as is usual in AMv2 (Master wants to do all assign ops in 
> AMv2). This means that the Master is surprised when it gets notification of 
> CLOSE ops that it did not initiate. Usually on CLOSE, Master updates meta 
> with the CLOSE state. On cluster shutdown we are not doing this.
>  # So, on restart, we read meta and we see all regions still in OPEN state so 
> we think the cluster crashed down so we go and do ServerCrashProcedure. Which 
> hoses our ability to retain assign.
> Some experiments:
>  # I can make the Master stay up so it is last to go down
>  # This makes it so we no longer spew the logs with failed transition 
> messages because Master is not up to receive the CLOSE transitions.
>  # I hacked in means of telling minicluster ports it should use on start; 
> helps fake case of new RS instances
>  # It is hard to tell the difference between a clean shutdown and a crash 
> down. It is dangerous if we get the call wrong. Currently, given that we just 
> let ServerCrashProcedure deal with it -- the safest option -- one experiment 
> is that when it goes to assign the regions that were on the crashed server, 
> rather than round robin, instead we should look and see if new instance of 
> old location and if so, just give it al lthe regions. That'd retain locality. 
> This seems to work. Problem is that SCP is doing assignment. Ideally balancer 
> would do it.
> Let me put up a patch that retains assignment across restart (somehow).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)