[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191954#comment-13191954
 ] 

stack commented on HBASE-5179:
------------------------------

Reviewing 0.92v17


isDeadServerInProgress is a new public method in ServerManager but it does not 
seem to be used anywhere.

Does isDeadRootServerInProgress need to be public?  Ditto for meta version.

This method param names are not right 'definitiveRootServer'; what is meant by 
definitive?  Do they need this qualifier?

Is there anything in place to stop us expiring a server twice if its carrying 
root and meta?

What is difference between asking assignment manager isCarryingRoot and this 
variable that is passed in?  Should be doc'd at least.  Ditto for meta.

I think I've asked for this a few times -- onlineServers needs to be 
explained... either in javadoc or in comment. This is the param passed into 
joinCluster.  How does it arise?  I think I know but am unsure.  God love the 
poor noob that comes awandering this code trying to make sense of it all.

It looks like we get the list by trawling zk for regionserver znodes that have 
not checked in.  Don't we do this operation earlier in master setup?  Are we 
doing it again here?

Though distributed split log is configured, we will do in master single process 
splitting under some conditions with this patch.  Its not explained in code why 
we would do this.  Why do we think master log splitting 'high priority' when it 
could very well be slower.  Should we only go this route if distributed 
splitting is not going on.  Do we know if concurrent distributed log splitting 
and master splitting works?

Why would we have dead servers in progress here in master startup?  Because a 
servershutdownhandler fired?

This patch is different to the patch for 0.90.  Should go into trunk first with 
tests, then 0.92.  Should it be in this issue?  This issue is really hard to 
follow now.  Maybe this issue is for 0.90.x and new issue for more work on this 
trunk patch?



This patch needs to have the v18 differences applied.




                
> Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
> region to be assigned before log splitting is completed, causing data loss
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5179
>                 URL: https://issues.apache.org/jira/browse/HBASE-5179
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 
> 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 
> 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 
> 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 
> 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 
> 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, 
> hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, 
> hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to