[ 
https://issues.apache.org/jira/browse/HBASE-21864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765323#comment-16765323
 ] 

Sergey Shelukhin edited comment on HBASE-21864 at 2/11/19 7:27 PM:
-------------------------------------------------------------------

[~stack] it's just the regular heartbeat. 
When RS reported incorrect state, master used to kill it (YouAreDeadException), 
but that was removed because of these races.

I was thinking storing a version per region (not sure yet if it can be in 
memory only, or if we'd have to store in meta too). It would be incremented by 
master on every change. It would just store the last version RS acked  for this 
region, and discard all messages before that.
One additional possible benefit is for the current crop of races with double 
assignment. If RS reports something like "I opened this region you never 
expected me to open", it would be easier to look and see that it's acting on a 
stale message and doesn't know the current state, and kill it conditionally to 
avoid data loss.


was (Author: sershe):
[~stack] it's just the regular heartbeat. 
When RS reported incorrect state, master used to kill it (YouAreDeadException), 
but that was removed because of these races.

I was thinking storing a version per region (not sure yet if it can be in 
memory only, or if we'd have to store in meta too). It would be incremented by 
master on every change. It would just store the last version RS acked  for this 
region, and discard all messages before that.
One additional possible benefit is for the current crop of races with double 
assignment. If RS reports something like "I opened this region you never 
expected me to open", it would be easier to look and see that it's acting on a 
stale message and kill it conditionally to avoid data loss.

> add region state version and reinstate YouAreDead exception in region report
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-21864
>                 URL: https://issues.apache.org/jira/browse/HBASE-21864
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> The state version will ensure we don't have network-related races  (e.g. the 
> one I reported in some other bug -
> {code}
> RS: send report {R1} ...
> M: close R1
> RS: I closed R1
> M ... receive report {R1}
> M: you shouldn't have R1, die
> {code}).
> Then we can revert the change that removed YouAreDead exception... RS in 
> incorrect state should be either brought into correct state or killed because 
> it means there's some bug; right now if double assignment happens (I found 2 
> different cases just this week ;)) master lets RS with incorrect assignment 
> keep it forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to