[jira] [Commented] (HBASE-20708) Make sure there is no race between the RMP scheduled when start up and the SCP

2018-06-11 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507877#comment-16507877
 ] 

Duo Zhang commented on HBASE-20708:
---

I think the difficulty here is that we have some cyclic dependencies. For 
example, we need to use hbase:meta to construct the region states, and also 
server state, and then we can construct ServerCrashProcedure since we need to 
use the server state to determine whether we are carrying meta.

For this case, I think we can change the code by checking the meta location 
file on zk to determine whether we are carrying meta, so that we can create 
ServerCrashProcedure at any time. And also, the serverQueue in 
MasterProcedureScheduler could be renamed to serverCrashQueue, and we can make 
sure that there will be only one procedure in the queue, which means that when 
we want to add a new SCP to a queue, we can make sure that the queue does not 
exist, so if the SCP is for a RS which carries meta, then we can give it a 
higher priority so it will be processed first.

Let me change the title of this issue and try to prepare a patch which removes 
the RMP from master startup.

Thanks.

> Make sure there is no race between the RMP scheduled when start up and the SCP
> --
>
> Key: HBASE-20708
> URL: https://issues.apache.org/jira/browse/HBASE-20708
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2, Region Assignment
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
>
> In HBASE-20700, we make RecoverMetaProcedure use a special lock which is only 
> used by RMP to avoid dead lock with MoveRegionProcedure. But we will always 
> schedule a RMP when master starting up, so we still need to make sure that 
> there is no race between this RMP and other RMPs and SCPs scheduled before 
> the master restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20708) Make sure there is no race between the RMP scheduled when start up and the SCP

2018-06-11 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507828#comment-16507828
 ] 

Duo Zhang commented on HBASE-20708:
---

After reading HBASE-19287, I tend to change my mind, that RMP is not a good 
idea for meta recovery at startup. There is no way to make sure that the meta 
region is online, as a server crash can happen at any time. So we should not 
make assumption that the meta region is online when designing or writing code.

Let me think more on how to fix it. I think this will not go into 2.0, as after 
HBASE-20700 the code is kinda 'stable', which means it is not easy to hit the 
corner cases which make you dead.

Thanks.

> Make sure there is no race between the RMP scheduled when start up and the SCP
> --
>
> Key: HBASE-20708
> URL: https://issues.apache.org/jira/browse/HBASE-20708
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2, Region Assignment
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
>
> In HBASE-20700, we make RecoverMetaProcedure use a special lock which is only 
> used by RMP to avoid dead lock with MoveRegionProcedure. But we will always 
> schedule a RMP when master starting up, so we still need to make sure that 
> there is no race between this RMP and other RMPs and SCPs scheduled before 
> the master restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20708) Make sure there is no race between the RMP scheduled when start up and the SCP

2018-06-10 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507656#comment-16507656
 ] 

stack commented on HBASE-20708:
---

Should this be linked to another issue [~Apache9] that fills in context sir?

> Make sure there is no race between the RMP scheduled when start up and the SCP
> --
>
> Key: HBASE-20708
> URL: https://issues.apache.org/jira/browse/HBASE-20708
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2, Region Assignment
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)