Apache9 commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-673201873
> Thanks @Apache9 , I want to agree with you to have a HBCK option, but one concern I have and keep struggling about making this automated instead of HBCK options. If one HBase cluster has hundred of tables with thousand of regions, how would the operator recovery the cluster? does he/she (offline/online) repair the meta table by scanning the storage on each region ? (instead we can just load the meta without rebuilding it?) I think for the scenario here, we just need to write the cluster id and other things to zookeeper? Just make sure that the current code in HBase will not consider us as a fresh new cluster. We do not need to rebuild meta? > > Tbh, I felt bad to bring this meta table issue because normal HBase cluster does not assume Zookeeper (and WAL) could be gone after the cluster starts and restarts. > > [updated] for this PR/JIRA, mainly, I'm questioning what a `partial meta` should be (e.g. it's now relying on the state of `InitMetaProcedure` instead of the data of meta table), any thoughts ? After introducing proc-v2, we reply on it to record the state of a multi-step operation. Here, I believe the problem is that, we schedule an InitMetaProcedure when we do have a meta table in place. For the InitMetaProcedure, the assumption is that, if we found that the meta table directory is there, then it means the procedure itself has crashed before finishing the creation of meta table, i.e, the meta table is 'partial'. So it is safe to just remove it and create again. I think this is a very common trick in distributed system for handling failures? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org