[
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282106#comment-13282106
]
chunhui shen commented on HBASE-5916:
-------------------------------------
@ram
bq.the RS is slow in checking in.
First, we will do the following:Check zk for regionservers that are up but
didn't register.
Anyway, if the RS is check in when master splitting log, I think we could
specify log dirs to split for splitLogAfterStartup.
We could do it as the following:
{code}
HMaster#finishInitialization{
...
Path logsDirPath = new Path(this.rootdir, HConstants.HREGION_LOGDIR_NAME);
FileStatus[] logFolders = FSUtils.listStatus(this.fs, logsDirPath, null);
Set<ServerName> onlineServers = new HashSet<ServerName>(serverManager
.getOnlineServers().keySet());
List<ServerName> needSplitServers= new ArrayList<ServerName>();
for (FileStatus status : logFolders){
String sn = status.getPath().getName();
// truncate splitting suffix if present (for ServerName parsing)
if (sn.endsWith(HLog.SPLITTING_EXT)) {
sn = sn.substring(0, sn.length() - HLog.SPLITTING_EXT.length());
}
ServerName serverName = ServerName.parseServerName(sn);
if (!onlineServers.contains(serverName)) {
needSplitServers.add(serverName)
}
}
...
splitLogAfterStartup(mfs,needSplitServers)
}
{code}
Could the above fix the case?
Correct me if wrong, Thanks
> RS restart just before master intialization we make the cluster non operative
> -----------------------------------------------------------------------------
>
> Key: HBASE-5916
> URL: https://issues.apache.org/jira/browse/HBASE-5916
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Priority: Critical
> Fix For: 0.94.1
>
> Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch,
> HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch,
> HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch
>
>
> Consider a case where my master is getting restarted. RS that was alive when
> the master restart started, gets restarted before the master initializes the
> ServerShutDownHandler.
> {code}
> serverShutdownHandlerEnabled = true;
> {code}
> In this case when the RS tries to register with the master, the master will
> try to expire the server but the server cannot be expired as still the
> serverShutdownHandler is not enabled.
> This case may happen when i have only one RS gets restarted or all the RS
> gets restarted at the same time.(before assignRootandMeta).
> {code}
> LOG.info(message);
> if (existingServer.getStartcode() < serverName.getStartcode()) {
> LOG.info("Triggering server recovery; existingServer " +
> existingServer + " looks stale, new server:" + serverName);
> expireServer(existingServer);
> }
> {code}
> If another RS is brought up then the cluster comes back to normalcy.
> May be a very corner case.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira