taklwu commented on pull request #2237:
URL: https://github.com/apache/hbase/pull/2237#issuecomment-736772662


   >  Is clumsy operator deleting the meta location znode by mistake a valid 
failure mode ?
   no this is a special case that we have been supporting, where the HBase 
cluster freshly restarts on top of only flushed HFiles and does not come with 
WAL or ZK. and we admitted that it's a bit different from the community stand 
points that WAL and ZK must be both pre-existed when master or/and RSs start on 
existing HFiles to resume the states left from any procedures. 
   
   > What about adding extra step before assign where we wait asking Master a 
question about the cluster state such as if any of the RSs that are checking in 
have Regions on them; i.e. if Regions already assigned, if an already 'up' 
cluster? Would that help?
   
   having extra step to check if RSs has any assigned may help, but I don't 
know if we can do that before the server manager find any region server is 
online. 
   
   > You fellows don't want to have to run a script beforehand? ZK is up and 
just put an empty location up or ask Master or hbck2 to do it for you? 
   I think HBCK/HBCK2 is performing online repairing, there are few concerns 
we're having 
   1. if the master is not up and running, then we cannot proceed 
   2. even if the master is up, the repairing on hundreds or thousand of 
regions implies long scanning time, which IMO we can save this time by just 
reloading it from existing meta. 
   3. having an additional steps/scripts to start a HBase cluster in the 
mentioned cloud use case seem a manual/semi-automated step we don't find a good 
fit to hold and maintain them.
   
   Personally, it's fine to me with throwing exception as Duo suggested, and on 
our side we need to find a way to continue if we see this exception. then we 
improve it in the future when we need to completely getting rid of the extra 
step on hbck. 
   
   So, for this PR, if we don't hear any other critical suggestion, maybe I 
will leave it "close" as unresolved, do you guys agree ? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to