[ https://issues.apache.org/jira/browse/HBASE-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3890: ------------------------- Priority: Critical (was: Major) Marking critical and patch available so we don't forget it. > Scheduled tasks in distributed log splitting not in sync with ZK > ---------------------------------------------------------------- > > Key: HBASE-3890 > URL: https://issues.apache.org/jira/browse/HBASE-3890 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.92.0 > Reporter: Lars George > Assignee: Jeffrey Zhong > Priority: Critical > Fix For: 0.96.0 > > Attachments: hbase-3890.patch > > > This is in continuation to HBASE-3889: > Note that there must be more slightly off here. Although the splitlogs znode > is now empty the master is still stuck here: > {noformat} > Doing distributed log split in > hdfs://localhost:8020/hbase/.logs/10.0.0.65,60020,1305406356765 > - Waiting for distributed tasks to finish. scheduled=2 done=1 error=0 4380s > Master startup > - Splitting logs after master startup 4388s > {noformat} > There seems to be an issue with what is in ZK and what the TaskBatch holds. > In my case it could be related to the fact that the task was already in ZK > after many faulty restarts because of the NPE. Maybe it was added once (since > that is keyed by path, and that is unique on my machine), but the reference > count upped twice? Now that the real one is done, the done counter has been > increased, but will never match the scheduled. > The code could also check if ZK is actually depleted, and therefore treat the > scheduled task as bogus? This of course only treats the symptom, not the root > cause of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira