[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-941241199 merged and if we need a new documentation, we can have a followup JIRA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-941241199 merged and if we need a new documentation, we can have a followup JIRA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-916299365 @joshelser should we wait till add the documentation in this commit ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-760640589 I have fixed the conflicts, probably will push it in two days and see if you guys have any comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-751833447 thanks Duo for your understanding and comments ;) and happy holidays! @z-york your point on idempotent is good, but at this point will you agree that we create a follow up on first add exception and discuss/handle this data loss issues in a different JIRA ? will you reconsider to change your -1 vote ? (btw I can rebase it after we agree this change) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-736772662 > Is clumsy operator deleting the meta location znode by mistake a valid failure mode ? no this is a special case that we have been supporting, where the HBase cluster freshly restarts on top of only flushed HFiles and does not come with WAL or ZK. and we admitted that it's a bit different from the community stand points that WAL and ZK must be both pre-existed when master or/and RSs start on existing HFiles to resume the states left from any procedures. > What about adding extra step before assign where we wait asking Master a question about the cluster state such as if any of the RSs that are checking in have Regions on them; i.e. if Regions already assigned, if an already 'up' cluster? Would that help? having extra step to check if RSs has any assigned may help, but I don't know if we can do that before the server manager find any region server is online. > You fellows don't want to have to run a script beforehand? ZK is up and just put an empty location up or ask Master or hbck2 to do it for you? I think HBCK/HBCK2 is performing online repairing, there are few concerns we're having 1. if the master is not up and running, then we cannot proceed 2. even if the master is up, the repairing on hundreds or thousand of regions implies long scanning time, which IMO we can save this time by just reloading it from existing meta. 3. having an additional steps/scripts to start a HBase cluster in the mentioned cloud use case seem a manual/semi-automated step we don't find a good fit to hold and maintain them. Personally, it's fine to me with throwing exception as Duo suggested, and on our side we need to find a way to continue if we see this exception. then we improve it in the future when we need to completely getting rid of the extra step on hbck. So, for this PR, if we don't hear any other critical suggestion, maybe I will leave it "close" as unresolved, do you guys agree ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-679418281 so.how can we get consensus on this PR or this series of idempotent issues (for InitMetaProcedure) ? I don't mind to break them into more tasks as Zach has created those followup bugs (HBASE-24922 and HBASE-24923), but I don't see a clear agreement among everyone from whether we should continue the bootstrap or fail hard on the bootstrap when we find the meta table in InitMetaProcedure. how do we move further? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-674265891 I apologized for the dev@ email, but I was thinking differently overnight about your suggestion (sorry that I reread many times until I found the gap this morning) > You can see the code in finisheActiveMasterInitialization and also the code in AssignmentManager.start. In AssignmentManager.start, we will try to load the start of meta region from zookeeper, and if there is none, we will create a region node in offline state. And in finishActiveMasterInitialization, if we find that the state of meta region node is offline, we will schedule InitMetaProcedure. So what you need to do here, is to put the meta region znode to zookeeper, before you restart the hbase cluster. So we will not schedule InitMetaProcedure again. didn't the coming up master region that store the meta location in [HBASE-24408](https://issues.apache.org/jira/browse/HBASE-24408) and [PR#1746](https://github.com/apache/hbase/pull/1746/commits/976d0c4e5b732a23773bd306f79e8017344b58f3) solve our conflict of interests that we don't need to relaying on ZK for getting the server name (old host) for the meta region ? such even if we don't have the ZK, we can move on and don't submit the InitMetaProcedure because the state of the meta region is not `OFFLINE`. if you confirm above, I may say bring this PR and keep highlighting the zookeeper discussion is my mistake and I should have learnt the master region ahead of this PR. (then we just need to move to the coming up version, and we can still restart on the cloud use cases) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-673880783 > how we could start a cluster with no data on zookeeper? IMO the title of the google design may be going on the cloud use cases that has been restarting on just HFiles without WAL and without Zookeeper (but all the user tables are flushed and disabled before terminating it). I knew that may not be mentioned in the [book tutorial](https://hbase.apache.org/book.html), and it may be a good time to clarify how that cases are actually working and some users has been using in HBase-1.4.x and HBase maybe before 2.1.7. Then we can see what the gaps maybe now in branch-2.2+ to support it back (basically, that's the intention of having this PR and [PR#2113](https://github.com/apache/hbase/pull/2113) ) > As you want to start the HMaster and recover from the inconsistency what does `inconsistency` mean here? I see your point that using `InitMetaProcedure#INIT_META_WRITE_FS_LAYOUT` to `indicate` inconsistency, but if we don't delete meta and just starting the cluster, IMO HBCK `-detail` will show a clean result without any inconsistency? we may not hit any `inconsistency` when getting into `InitMetaProcedure`. For this topic, I may just start a email on if `InitMetaProcedure` should delete meta without checking `partial` and `consistency`, please bear with me, and this may be the only thing I want a quick discussion instead of a long design doc on the cloud use cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-673751605 > should not depend on the data on zookeeper. I agreed with you that we may not be ready to totally skip relying on the data stored on zookeeper, that's definitely a boarder discussion on what HBase currently depends on Zookeeper (branch-2 and master), especially if data on Zookeeper could be ephemeral or removed. (I thought we're in the progress of moving data into ROOT region, aren't we ? e.g. [Proc-v2](https://issues.apache.org/jira/browse/HBASE-20610) Also, my initial goal is that the meta data/directory should not be deleted if possible, and we're trying to provide a persisted condition not to always delete meta if it's not `partial` (protected by the ZK data). sorry that I may be newbie on the proc-v2 and zk data, should we start a thread on the dev@ list to discuss about the following ? (my goal is to find a consensus how we can move this PR to either completes it or not fixed) 1. should we delete meta directory when HM starts ? 2. after 2.2+, should not depend on the data on zookeeper and have more of the info into proc-v2 in the master region? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-673253206 first of all, thanks Duo again. > I think for the scenario here, we just need to write the cluster id and other things to zookeeper? Just make sure that the current code in HBase will not consider us as a fresh new cluster. We do not need to rebuild meta? So, let me confirm your suggestion, that means if we add one more field in ZNode, e.g. a boolean `completedMetaBoostrap`, if we find both `clusterId` and `completedMetaBoostrap` in ZK, we will not delete meta directory ? followup if ZK Znode data is used to determine if this is a fresh new cluster, can we skip the delete meta directory if `clusterId` and `completedMetaBoostrap` are never set but we found meta directory? this is the cloud use cases which we don't have ZK to make the decision; such we don't know if the meta is partial, and IMO, we should just leave the meta directory and if anything bad happens, the operator can still run HBCK. (if we do the other way around and always delete the meta, then we're losing the possibility the cluster can heal itself, and we cannot confirm if this is partial, doesn't it?) > For the InitMetaProcedure, the assumption is that, if we found that the meta table directory is there, then it means the procedure itself has crashed before finishing the creation of meta table, i.e, the meta table is 'partial'. So it is safe to just remove it and create again. I think this is a very common trick in distributed system for handling failures? do you mean `idempotent` is trick ? `InitMetaProcedure` may be idempotent and can make `hbase:meta` online (as a empty table), but I don't think if the cluster/HM itself is `idempotent` automatically; and yeah, it can rebuild the data content of the original meta with the help of HBCK, but just if HM continues the flow with some existing data, e.g. the namespace table (sorry for branch-2 we have namespace table) and HM restart with a empty meta, based on the experiment I did, the cluster hangs and HM cannot be initialized. if we step back to just think on the definition of `partial` meta, it would be great if the meta table itself can tell if it's partial, because it's still a table in HBase and HFiles are immutable. e.g. can we tell if a user table is partial by looking at its data? I may be wrong, but it seems like we're not able to tell from HFiles itself, and we need ZK and WAL to define it. So, again, IMO data content in a table is sensitive especially the meta table, I'm proposing not to delete meta if possible here (it's also like running a hbck to delete and rebuild). Based on our discussion here, IMO we have two proposal mentioned to define `partial meta` . 1. add a boolean in WAL like a proc-level data 2. write a boolean in ZNode to tell if the bootstrap completes *. no matter we choose 1) and 2) above, we have an additional condition, if we don't find any WAL or ZK about this condition, we should not delete the meta table. seems 2) + *) should be the simplest solution, what do you guys think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] taklwu commented on pull request #2237: HBASE-24833: Bootstrap should not delete the META table directory if …
taklwu commented on pull request #2237: URL: https://github.com/apache/hbase/pull/2237#issuecomment-672204387 Thanks @Apache9 , I want to agree with you to have a HBCK option, but one concern I have and keep struggling about making this automated instead of HBCK options. If one HBase cluster has hundred of tables with thousand of regions, how would the operator recovery the cluster? does he/she (offline/online) repair the meta table by scanning the storage on each region ? (instead we can just load the meta without rebuilding it?) Tbh, I felt bad to bring this meta table issue because normal HBase cluster does not assume Zookeeper (and WAL) could be gone after the cluster starts and restarts. for this PR/JIRA, mainly, I'm questioning what a `partial meta` should be, any thoughts ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org