[jira] [Resolved] (HBASE-28464) Make replication ZKWatcher config customizable in extensions
[ https://issues.apache.org/jira/browse/HBASE-28464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros resolved HBASE-28464. - Resolution: Implemented Implemented by HBASE-28529 > Make replication ZKWatcher config customizable in extensions > > > Key: HBASE-28464 > URL: https://issues.apache.org/jira/browse/HBASE-28464 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Labels: pull-request-available > > The ZKWatcher in HBaseReplicationEndpoint always uses the source cluster's > ZooKeeper client config when connecting to the target cluster's zk. Those > might not match. I would like to make the used ZKClientConfig customizable > for replication extensions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28464) Make replication ZKWatcher config customizable in extensions
Szabolcs Bukros created HBASE-28464: --- Summary: Make replication ZKWatcher config customizable in extensions Key: HBASE-28464 URL: https://issues.apache.org/jira/browse/HBASE-28464 Project: HBase Issue Type: Improvement Components: Replication Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros The ZKWatcher in HBaseReplicationEndpoint always uses the source cluster's ZooKeeper client config when connecting to the target cluster's zk. Those might not match. I would like to make the used ZKClientConfig customizable for replication extensions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27493) Allow namespace admins to clone snapshots created by them
[ https://issues.apache.org/jira/browse/HBASE-27493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678222#comment-17678222 ] Szabolcs Bukros commented on HBASE-27493: - [~psomogyi] Added the release notes. Thanks a lot for the merge! > Allow namespace admins to clone snapshots created by them > - > > Key: HBASE-27493 > URL: https://issues.apache.org/jira/browse/HBASE-27493 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 3.0.0-alpha-3, 2.5.1 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > Creating a snapshot requires table admin permissions. But cloning it requires > global admin permissions unless the user owns the snapshot and wants to > recreate the original table the snapshot was based on using the same table > name. This puts unnecessary load on the few people having global admin > permissions. I would like to relax this rule a bit and allow the owner of the > snapshot to clone it into any namespace where they have admin permissions > regardless of the table name used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27493) Allow namespace admins to clone snapshots created by them
[ https://issues.apache.org/jira/browse/HBASE-27493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros updated HBASE-27493: Release Note: Allow namespace admins to clone snapshots created by them to any table inside their namespace, not just re-create the old table > Allow namespace admins to clone snapshots created by them > - > > Key: HBASE-27493 > URL: https://issues.apache.org/jira/browse/HBASE-27493 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 3.0.0-alpha-3, 2.5.1 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > Creating a snapshot requires table admin permissions. But cloning it requires > global admin permissions unless the user owns the snapshot and wants to > recreate the original table the snapshot was based on using the same table > name. This puts unnecessary load on the few people having global admin > permissions. I would like to relax this rule a bit and allow the owner of the > snapshot to clone it into any namespace where they have admin permissions > regardless of the table name used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27493) Allow namespace admins to clone snapshots created by them
Szabolcs Bukros created HBASE-27493: --- Summary: Allow namespace admins to clone snapshots created by them Key: HBASE-27493 URL: https://issues.apache.org/jira/browse/HBASE-27493 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 2.5.1, 3.0.0-alpha-3 Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros Creating a snapshot requires table admin permissions. But cloning it requires global admin permissions unless the user owns the snapshot and wants to recreate the original table the snapshot was based on using the same table name. This puts unnecessary load on the few people having global admin permissions. I would like to relax this rule a bit and allow the owner of the snapshot to clone it into any namespace where they have admin permissions regardless of the table name used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27154) Backport missing MOB related changes to branch-2
[ https://issues.apache.org/jira/browse/HBASE-27154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580795#comment-17580795 ] Szabolcs Bukros commented on HBASE-27154: - [~ndimiduk] The PR is already up. > Backport missing MOB related changes to branch-2 > > > Key: HBASE-27154 > URL: https://issues.apache.org/jira/browse/HBASE-27154 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.6.0 >Reporter: Szabolcs Bukros >Assignee: Andrew Kyle Purtell >Priority: Major > Fix For: 2.5.0 > > > While trying to backport https://issues.apache.org/jira/browse/HBASE-26969 to > branch-2 I have found that multiple major MOB related changes are missing. > This change is required for FileBased SFT correctness so the changes it > depends on should be backported first. Also any improvement to MOB stability > is usually welcomed. > The missing changes I have found so far: > https://issues.apache.org/jira/browse/HBASE-22749 > https://issues.apache.org/jira/browse/HBASE-23723 > https://issues.apache.org/jira/browse/HBASE-24163 > There is also a docs change describing the new MOB functionality. But > considering that the book is always generated based on master I think it is > safe to skip backporting it. > https://issues.apache.org/jira/browse/HBASE-23198 > I'm planning to backport these changes one by one until we reach a state > where HBASE-26969 can be backported too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580779#comment-17580779 ] Szabolcs Bukros edited comment on HBASE-26969 at 8/17/22 1:10 PM: -- [~apurtell] [~huaxiangsun] Please find [#4712|https://github.com/apache/hbase/pull/4712] with the branch-2 backport. Please note I made some additional changes in this PR: * [#4617|https://github.com/apache/hbase/pull/4617] left 2 test classes in the code that the original commit deleted and I had to delete them here. * HBASE-25970 is another dependency I missed in HBASE-27154. To shorten the review cycles instead of backporting it separately I included the changes it contained in this PR. Please let me know if that would not be acceptable and I prepare a separate PR for that backport. was (Author: bszabolcs): [~apurtell] [~huaxiangsun] Please find [#4712|https://github.com/apache/hbase/pull/4712] with the branch-2 backport. Please note I made some additional changes: * [#4617|https://github.com/apache/hbase/pull/4617] left 2 test classes in the code that the original commit deleted and I had to delete them here. * HBASE-25970 is another dependency I missed in HBASE-27154. To shorten the review cycles instead of backporting it separately I included the changes it contained in this PR. Please let me know if that would not be acceptable and I prepare a separate PR for that backport. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-4 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580779#comment-17580779 ] Szabolcs Bukros edited comment on HBASE-26969 at 8/17/22 1:09 PM: -- [~apurtell] [~huaxiangsun] Please find [#4712|https://github.com/apache/hbase/pull/4712] with the branch-2 backport. Please note I made some additional changes: * [#4617|https://github.com/apache/hbase/pull/4617] left 2 test classes in the code that the original commit deleted and I had to delete them here. * HBASE-25970 is another dependency I missed in HBASE-27154. To shorten the review cycles instead of backporting it separately I included the changes it contained in this PR. Please let me know if that would not be acceptable and I prepare a separate PR for that backport. was (Author: bszabolcs): [~apurtell] [~huaxiangsun] Please find #4712 with the branch-2 backport. Please note I made some additional changes: * [#4617|https://github.com/apache/hbase/pull/4617] left 2 test classes in the code that the original commit deleted and I had to delete them here. * HBASE-25970 is another dependency I missed in HBASE-27154. To shorten the review cycles instead of backporting it separately I included the changes it contained in this PR. Please let me know if that would not be acceptable and I prepare a separate PR for that backport. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-4 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580779#comment-17580779 ] Szabolcs Bukros commented on HBASE-26969: - [~apurtell] [~huaxiangsun] Please find #4712 with the branch-2 backport. Please note I made some additional changes: * [#4617|https://github.com/apache/hbase/pull/4617] left 2 test classes in the code that the original commit deleted and I had to delete them here. * HBASE-25970 is another dependency I missed in HBASE-27154. To shorten the review cycles instead of backporting it separately I included the changes it contained in this PR. Please let me know if that would not be acceptable and I prepare a separate PR for that backport. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-4 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27204) BlockingRpcClient will hang for 20 seconds when SASL is enabled after finishing negotiation
[ https://issues.apache.org/jira/browse/HBASE-27204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568597#comment-17568597 ] Szabolcs Bukros commented on HBASE-27204: - [~apurtell] Please revert HBASE-24579. I have done some of the investigation I should have done 2 years ago and found that not reading the potential error msg might not be limited to PLAIN sasl. Based on my understanding of the code this could happen with GSS too. GssKrb5Client after evaluating the final handshake challenge can send a gssOutToken back to the server, just after setting "completed" to true. Then GssKrb5Server tries to evaluate the response in doHandshake2 where it either fails with an exception or returns with null, basically producing the same issue we have with PLAIN sasl. Because the client is already completed the potential response is never read. I think a potential fix would have 3 parts. * ServerRpcConnection.saslReadAndProcess could be changed to always return a response even if replyToken is null. Maybe just an empty byte array. This would make the communication consistent by allowing us to always check the stream for a response. * HBaseSaslRpcClient.saslConnect now could be extended to track if a "readStatus" was called after a response was writen. If the client is complete, but we are still waiting for a response we could call "readStatus". * Netty. Considering ServerRpcConnection.saslReadAndProcess is shared between the implementation I assume the issue is present in Netty too, but I do not understand that code well enough to propose a solution. What do you think? > BlockingRpcClient will hang for 20 seconds when SASL is enabled after > finishing negotiation > --- > > Key: HBASE-27204 > URL: https://issues.apache.org/jira/browse/HBASE-27204 > Project: HBase > Issue Type: Bug > Components: rpc, sasl, security >Reporter: Duo Zhang >Assignee: Andrew Kyle Purtell >Priority: Critical > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > Found this when implementing HBASE-27185. When running TestSecureIPC, if > BlockingRpcClient is used, the tests will spend much more time comparing to > NettyRpcClient. > The problem is that, for the normal kerberos authentication, the last step is > client send a reply to server, so after server receives the last token, it > will not write anything back but expect client to send connection header. > In HBASE-24579, for reading the error message, we added a readReply after the > SaslClient indicates that the negotiation is completed. But as said above, > for normal cases, we will not write anything back from server side, so the > client will hang there and only throw an exception when timeout is reached, > which is 20 seconds. > This nearly makes the BlockingRpcClient unusable when sasl is enabled, as it > will hang 20 seconds when connecting... -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27204) BlockingRpcClient will hang for 20 seconds when SASL is enabled after finishing negotiation
[ https://issues.apache.org/jira/browse/HBASE-27204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567319#comment-17567319 ] Szabolcs Bukros commented on HBASE-27204: - [~zhangduo] I agree, my solution is bad. Not trying to defend it, just wanted to add some context. > BlockingRpcClient will hang for 20 seconds when SASL is enabled after > finishing negotiation > --- > > Key: HBASE-27204 > URL: https://issues.apache.org/jira/browse/HBASE-27204 > Project: HBase > Issue Type: Bug > Components: rpc, sasl, security >Reporter: Duo Zhang >Priority: Critical > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > Found this when implementing HBASE-27185. When running TestSecureIPC, if > BlockingRpcClient is used, the tests will spend much more time comparing to > NettyRpcClient. > The problem is that, for the normal kerberos authentication, the last step is > client send a reply to server, so after server receives the last token, it > will not write anything back but expect client to send connection header. > In HBASE-24579, for reading the error message, we added a readReply after the > SaslClient indicates that the negotiation is completed. But as said above, > for normal cases, we will not write anything back from server side, so the > client will hang there and only throw an exception when timeout is reached, > which is 20 seconds. > This nearly makes the BlockingRpcClient unusable when sasl is enabled, as it > will hang 20 seconds when connecting... -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27204) BlockingRpcClient will hang for 20 seconds when SASL is enabled after finishing negotiation
[ https://issues.apache.org/jira/browse/HBASE-27204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567308#comment-17567308 ] Szabolcs Bukros commented on HBASE-27204: - [~zhangduo] We were experimenting with a custom rpc client based on the blocking rpc client that would also support PLAIN auth, when encountered the issue. Basically I have seen that that the PLAIN client sets "completed = true" at getInitialResponse() call and because of this, skips the rest of the logic in the method. This means if the authentication or the connection fails the potential error msg is never read and the application just assumes everything is all right. The SaslClientAuthenticationProvider is plugable with BlockingRpcConnection too meaning this could happen there too and I wanted to provide a fix that would prevent this. Unfortunately I have not fully grasped the issue and the consequences of my "fix". > BlockingRpcClient will hang for 20 seconds when SASL is enabled after > finishing negotiation > --- > > Key: HBASE-27204 > URL: https://issues.apache.org/jira/browse/HBASE-27204 > Project: HBase > Issue Type: Bug > Components: rpc, sasl, security >Reporter: Duo Zhang >Priority: Critical > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > Found this when implementing HBASE-27185. When running TestSecureIPC, if > BlockingRpcClient is used, the tests will spend much more time comparing to > NettyRpcClient. > The problem is that, for the normal kerberos authentication, the last step is > client send a reply to server, so after server receives the last token, it > will not write anything back but expect client to send connection header. > In HBASE-24579, for reading the error message, we added a readReply after the > SaslClient indicates that the negotiation is completed. But as said above, > for normal cases, we will not write anything back from server side, so the > client will hang there and only throw an exception when timeout is reached, > which is 20 seconds. > This nearly makes the BlockingRpcClient unusable when sasl is enabled, as it > will hang 20 seconds when connecting... -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27154) Backport missing MOB related changes to branch-2
[ https://issues.apache.org/jira/browse/HBASE-27154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566256#comment-17566256 ] Szabolcs Bukros commented on HBASE-27154: - Thanks a lot for your help [~apurtell] ! > Backport missing MOB related changes to branch-2 > > > Key: HBASE-27154 > URL: https://issues.apache.org/jira/browse/HBASE-27154 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.6.0 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > > While trying to backport https://issues.apache.org/jira/browse/HBASE-26969 to > branch-2 I have found that multiple major MOB related changes are missing. > This change is required for FileBased SFT correctness so the changes it > depends on should be backported first. Also any improvement to MOB stability > is usually welcomed. > The missing changes I have found so far: > https://issues.apache.org/jira/browse/HBASE-22749 > https://issues.apache.org/jira/browse/HBASE-23723 > https://issues.apache.org/jira/browse/HBASE-24163 > There is also a docs change describing the new MOB functionality. But > considering that the book is always generated based on master I think it is > safe to skip backporting it. > https://issues.apache.org/jira/browse/HBASE-23198 > I'm planning to backport these changes one by one until we reach a state > where HBASE-26969 can be backported too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565318#comment-17565318 ] Szabolcs Bukros commented on HBASE-26969: - [~apurtell] branch-2 misses some very important MOB related commits that this change relies on and we have to backport those first. It's tracked in HBASE-27154. The first and most complex backport is done, the rest should be easier but I have deadlines coming up and will not be able to continue this for a few weeks. If you could do the rest or find someone to do the rest of the missing commits, I would be happy to prepare a backport for this commit though. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27154) Backport missing MOB related changes to branch-2
[ https://issues.apache.org/jira/browse/HBASE-27154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559143#comment-17559143 ] Szabolcs Bukros commented on HBASE-27154: - PR for HBASE-22749 : https://github.com/apache/hbase/pull/4581 > Backport missing MOB related changes to branch-2 > > > Key: HBASE-27154 > URL: https://issues.apache.org/jira/browse/HBASE-27154 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.6.0 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > > While trying to backport https://issues.apache.org/jira/browse/HBASE-26969 to > branch-2 I have found that multiple major MOB related changes are missing. > This change is required for FileBased SFT correctness so the changes it > depends on should be backported first. Also any improvement to MOB stability > is usually welcomed. > The missing changes I have found so far: > https://issues.apache.org/jira/browse/HBASE-22749 > https://issues.apache.org/jira/browse/HBASE-23723 > https://issues.apache.org/jira/browse/HBASE-24163 > There is also a docs change describing the new MOB functionality. But > considering that the book is always generated based on master I think it is > safe to skip backporting it. > https://issues.apache.org/jira/browse/HBASE-23198 > I'm planning to backport these changes one by one until we reach a state > where HBASE-26969 can be backported too. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-27154) Backport missing MOB related changes to branch-2
Szabolcs Bukros created HBASE-27154: --- Summary: Backport missing MOB related changes to branch-2 Key: HBASE-27154 URL: https://issues.apache.org/jira/browse/HBASE-27154 Project: HBase Issue Type: Bug Components: mob Affects Versions: 2.6.0 Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros While trying to backport https://issues.apache.org/jira/browse/HBASE-26969 to branch-2 I have found that multiple major MOB related changes are missing. This change is required for FileBased SFT correctness so the changes it depends on should be backported first. Also any improvement to MOB stability is usually welcomed. The missing changes I have found so far: https://issues.apache.org/jira/browse/HBASE-22749 https://issues.apache.org/jira/browse/HBASE-23723 https://issues.apache.org/jira/browse/HBASE-24163 There is also a docs change describing the new MOB functionality. But considering that the book is always generated based on master I think it is safe to skip backporting it. https://issues.apache.org/jira/browse/HBASE-23198 I'm planning to backport these changes one by one until we reach a state where HBASE-26969 can be backported too. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544550#comment-17544550 ] Szabolcs Bukros commented on HBASE-26969: - {quote} So p1 and p2 can have references to p even when p is no longer in meta? {quote} Not exactly. In this scenario p no longer exists in our outside the meta. It is not referenced either. The mobfile "_p" exists and is referenced from p1 and p2. It was created by p but it exists outside the data folder in an entirely different structure under /hbase/mobdir. It is not and never was part of p. It was only referenced from p. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27069) Hbase SecureBulkload permission regression
[ https://issues.apache.org/jira/browse/HBASE-27069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544177#comment-17544177 ] Szabolcs Bukros commented on HBASE-27069: - You are right, that is a regression. Thanks for the fix! > Hbase SecureBulkload permission regression > -- > > Key: HBASE-27069 > URL: https://issues.apache.org/jira/browse/HBASE-27069 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > > HBASE-26707 has introduced a bug, where setting the permission of the bulk > loaded HFile to 777 is made conditional. > However, as discussed in HBASE-15790, that permission is essential for > HBase's correct operation. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542548#comment-17542548 ] Szabolcs Bukros edited comment on HBASE-26969 at 5/26/22 4:10 PM: -- {quote}the master cleaner could check only files from regions not online on any RS {quote} That would not be enough. Consider the following scenario. Region p creates a mobfile with a name of "_p". While region p is online the rs cleaner can identify that "_p" belongs to this region and can clean it up if it is no longer referenced from said region. Now let's split region p. We have 2 new regions p1, p2 and p is archived, maybe even deleted altogether. Both p1 and p2 are online and contain references to "_p" mobfile but we have no way of knowing we should search these regions for references. So the master cleaner have to read every single hfile to find the references in p1 and p2. The mobfiles keep their name until a major compaction runs. was (Author: bszabolcs): > the master cleaner could check only files from regions not online on any RS That would not be enough. Consider the following scenario. Region p creates a mobfile with a name of "_p". While region p is online the rs cleaner can identify that "_p" belongs to this region and can clean it up if it is no longer referenced from said region. Now let's split region p. We have 2 new regions p1, p2 and p is archived, maybe even deleted altogether. Both p1 and p2 are online and contain references to "_p" mobfile but we have no way of knowing we should search these regions for references. So the master cleaner have to read every single hfile to find the references in p1 and p2. The mobfiles keep their name until a major compaction runs. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542548#comment-17542548 ] Szabolcs Bukros commented on HBASE-26969: - > the master cleaner could check only files from regions not online on any RS That would not be enough. Consider the following scenario. Region p creates a mobfile with a name of "_p". While region p is online the rs cleaner can identify that "_p" belongs to this region and can clean it up if it is no longer referenced from said region. Now let's split region p. We have 2 new regions p1, p2 and p is archived, maybe even deleted altogether. Both p1 and p2 are online and contain references to "_p" mobfile but we have no way of knowing we should search these regions for references. So the master cleaner have to read every single hfile to find the references in p1 and p2. The mobfiles keep their name until a major compaction runs. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541025#comment-17541025 ] Szabolcs Bukros commented on HBASE-26969: - [~zhangduo] I'm sorry, I might have misspoken. We do not need an extra SFT. Having the references in hfile metadata is sufficient. It's just slow and clunky. But it works. I hoped a better way of storing this data could be found, but as you have pointed out that is not necessary. SFT is only linked to this issue, because SFT and removing renames are thematically connected, I'm relying on some tools/solutions added to support SFT and removing renames makes things more complicated so instead of changing the default behavior the idea was to only remove them when SFT, that removed the other renames, is enabled. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used
[ https://issues.apache.org/jira/browse/HBASE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros reassigned HBASE-27017: --- Assignee: Szabolcs Bukros > MOB snapshot is broken when FileBased SFT is used > - > > Key: HBASE-27017 > URL: https://issues.apache.org/jira/browse/HBASE-27017 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-2 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > > During snapshot MOB regions are treated like any other region. When a > snapshot is taken and hfile references are collected a StoreFileTracker is > created to get the current active hfile list. But the MOB region stores are > not tracked so an empty list is returned, resulting in a broken snapshot. > When this snapshot is cloned the resulting table will have no MOB files or > references. > The problematic code can be found here: > [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used
[ https://issues.apache.org/jira/browse/HBASE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541018#comment-17541018 ] Szabolcs Bukros commented on HBASE-27017: - [~zhangduo] If you look at it like this, then you are absolutely right :) > MOB snapshot is broken when FileBased SFT is used > - > > Key: HBASE-27017 > URL: https://issues.apache.org/jira/browse/HBASE-27017 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-2 >Reporter: Szabolcs Bukros >Priority: Major > > During snapshot MOB regions are treated like any other region. When a > snapshot is taken and hfile references are collected a StoreFileTracker is > created to get the current active hfile list. But the MOB region stores are > not tracked so an empty list is returned, resulting in a broken snapshot. > When this snapshot is cloned the resulting table will have no MOB files or > references. > The problematic code can be found here: > [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used
[ https://issues.apache.org/jira/browse/HBASE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541008#comment-17541008 ] Szabolcs Bukros commented on HBASE-27017: - [~zhangduo] That would not work well after we remove the renames. It would mean the snapshot would also contain any incomplete or broken files too currently in the dir. Copying trash around is a non-issue, because we only ever read the referenced mob files, but if a snapshot is made during write operation that later fails and the file is removed we would end up with a snapshot referencing a missing file. > MOB snapshot is broken when FileBased SFT is used > - > > Key: HBASE-27017 > URL: https://issues.apache.org/jira/browse/HBASE-27017 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-2 >Reporter: Szabolcs Bukros >Priority: Major > > During snapshot MOB regions are treated like any other region. When a > snapshot is taken and hfile references are collected a StoreFileTracker is > created to get the current active hfile list. But the MOB region stores are > not tracked so an empty list is returned, resulting in a broken snapshot. > When this snapshot is cloned the resulting table will have no MOB files or > references. > The problematic code can be found here: > [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541004#comment-17541004 ] Szabolcs Bukros commented on HBASE-26969: - [~wchevreuil] > Or is it that we lingering region dirs even after the parent SPLIT/MERGE got >removed from META? Yes. these regions are clean up by a chore and linger after no longer used. But this does not really matter. The problem is that when we can not easily identify which region should contain references to a given MOB file we have absolutely no way to tell and have to read every single hfile's metadata to check for references. [~zhangduo] > we do not use SFT at all the MOB regions That is true. We do not use SFT. But based on similar changes the renames would be only eliminated if SFT is enabled, not by default. Also it relies on WriterCreationTracker which is mostly an SFT tool. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537787#comment-17537787 ] Szabolcs Bukros edited comment on HBASE-26969 at 5/16/22 8:45 PM: -- {quote}I guess if we want to compact the mob files, we always need to compact the normal files which references the mob files so we can update the references in the metadata? {quote} Yes, that is how it works. {quote}The mob files should have a different name prefix or under a different directory? {quote} They have a different directory structure. "/mobdir/data/default/table_name/a0209c070c85d1e4d500af8ba33c3c02/cf" They are stored fully separated. Please note these regions only contain mob files and are fully independent from the referencing regions. A single mob region could theoretically contain every MOB file in hbase regardless of where it is referenced from. Also their naming convention is different. For us the only important thing is that it ends with "_", so something like this: "0cc175b9c0f1b6a831c399e2697726612022050314ecf20b51674cd6bd647bfb2d88b1ff_b593e96e821ba6211d8a4b101a88" {quote}So at least for loading, there will be no problem {quote} That's true. Reads are very straightforward. {quote}I think the only problem here is how do we clean up the half written mob files, I think the logic is mainly the same with what we have now, get all the mob refs from all the normal storefiles, to construct the base list, and then get all the mob files which are currently being written, all MOB files besides them are the ones should be deleted. {quote} That is part of the problme, yes. To have access to the half written mob file list the cleaner have to run on the RS. But each RS only has access to it's own half written mob file list so each can only clean a subset of the existing mob files. To be precise if a mob file name ends with a region's name that is hosted on the current RS then the cleaner can decide if it can be archived or not. Unfortunately with merges and splits regions get archived so after a point there will be mob files containing names of regions not hosted on any RS and none of the cleaners running on RSes could clean these up. So we need one more cleaner specifically for these (I put it on master to replace the original cleaner), that have to read every available hfile to make sure we have every active mob reference and are able to decide if a mob file created by a since archived region can be archived or not. was (Author: bszabolcs): {quote}I guess if we want to compact the mob files, we always need to compact the normal files which references the mob files so we can update the references in the metadata? {quote} Yes, that is how it works. {quote}The mob files should have a different name prefix or under a different directory? {quote} They have a different directory structure. "/mobdir/data/default/table_name/a0209c070c85d1e4d500af8ba33c3c02/cf" They are stored fully separated. Please note these regions only contain mob files and are fully independent from the referencing regions. A single mob region could theoretically contain every MOB file in hbase regardless of where it is referenced from. Also their naming convention is different. For us the only important thing is that it ends with "_", so something like this: "0cc175b9c0f1b6a831c399e2697726612022050314ecf20b51674cd6bd647bfb2d88b1ff_b593e96e821ba6211d8a4b101a88" {quote}So at least for loading, there will be no problem {quote} That's true. Reads are very straightforward. {quote}I think the only problem here is how do we clean up the half written mob files, I think the logic is mainly the same with what we have now, get all the mob refs from all the normal storefiles, to construct the base list, and then get all the mob files which are currently being written, all MOB files besides them are the ones should be deleted. {quote} That is part of the problme, yes. To have access to the half written mob file list the cleaner have to run on the RS. But each RS only has access to it's own }}half written mob file list so each can only clean a subset of the existing mob files. To be precise if a mob file name ends with a region's name that is hosted on the current RS then the cleaner can decide if it can be archived or not. Unfortunately with merges and splits regions get archived so after a point there will be mob files containing names of regions not hosted on any RS and none of the cleaners running on RSes could clean these up. So we need one more cleaner specifically for these (I put it on master to replace the original cleaner), that have to read every available hfile to make sure we have every active mob reference and are able to decide if a mob file created by a since archived region can be archived or not. > Eliminate MOB renames when SFT is enabled >
[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537787#comment-17537787 ] Szabolcs Bukros edited comment on HBASE-26969 at 5/16/22 8:45 PM: -- {quote}I guess if we want to compact the mob files, we always need to compact the normal files which references the mob files so we can update the references in the metadata? {quote} Yes, that is how it works. {quote}The mob files should have a different name prefix or under a different directory? {quote} They have a different directory structure. "/mobdir/data/default/table_name/a0209c070c85d1e4d500af8ba33c3c02/cf" They are stored fully separated. Please note these regions only contain mob files and are fully independent from the referencing regions. A single mob region could theoretically contain every MOB file in hbase regardless of where it is referenced from. Also their naming convention is different. For us the only important thing is that it ends with "_", so something like this: "0cc175b9c0f1b6a831c399e2697726612022050314ecf20b51674cd6bd647bfb2d88b1ff_b593e96e821ba6211d8a4b101a88" {quote}So at least for loading, there will be no problem {quote} That's true. Reads are very straightforward. {quote}I think the only problem here is how do we clean up the half written mob files, I think the logic is mainly the same with what we have now, get all the mob refs from all the normal storefiles, to construct the base list, and then get all the mob files which are currently being written, all MOB files besides them are the ones should be deleted. {quote} That is part of the problme, yes. To have access to the half written mob file list the cleaner have to run on the RS. But each RS only has access to it's own }}half written mob file list so each can only clean a subset of the existing mob files. To be precise if a mob file name ends with a region's name that is hosted on the current RS then the cleaner can decide if it can be archived or not. Unfortunately with merges and splits regions get archived so after a point there will be mob files containing names of regions not hosted on any RS and none of the cleaners running on RSes could clean these up. So we need one more cleaner specifically for these (I put it on master to replace the original cleaner), that have to read every available hfile to make sure we have every active mob reference and are able to decide if a mob file created by a since archived region can be archived or not. was (Author: bszabolcs): {{{quote}}} I guess if we want to compact the mob files, we always need to compact the normal files which references the mob files so we can update the references in the metadata? {quote} Yes, that is how it works. {{{quote}}} The mob files should have a different name prefix or under a different directory? {{{}{quote}{}}}{{{}{}}} They have a different directory structure. "/mobdir/data/default/table_name/a0209c070c85d1e4d500af8ba33c3c02/cf" They are stored fully separated. Please note these regions only contain mob files and are fully independent from the referencing regions. A single mob region could theoretically contain every MOB file in hbase regardless of where it is referenced from. Also their naming convention is different. For us the only important thing is that it ends with "_", so something like this: "0cc175b9c0f1b6a831c399e2697726612022050314ecf20b51674cd6bd647bfb2d88b1ff_b593e96e821ba6211d8a4b101a88" {quote} So at least for loading, there will be no problem {quote} That's true. Reads are very straightforward. {{{quote}}} I think the only problem here is how do we clean up the half written mob files, I think the logic is mainly the same with what we have now, get all the mob refs from all the normal storefiles, to construct the base list, and then get all the mob files which are currently being written, all MOB files besides them are the ones should be deleted. {quote}{{{}{}}} {{That is part of the problme, yes. To have access to the half written mob file list the cleaner have to run on the RS. But each RS only has access to it's own }}half written mob file list so each can only clean a subset of the existing mob files. To be precise if a mob file name ends with a region's name that is hosted on the current RS then the cleaner can decide if it can be archived or not. Unfortunately with merges and splits regions get archived so after a point there will be mob files containing names of regions not hosted on any RS and none of the cleaners running on RSes could clean these up. So we need one more cleaner specifically for these (I put it on master to replace the original cleaner), that have to read every available hfile to make sure we have every active mob reference and are able to decide if a mob file created by a since archived region can be archived or not. > Elimi
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537787#comment-17537787 ] Szabolcs Bukros commented on HBASE-26969: - {{{quote}}} I guess if we want to compact the mob files, we always need to compact the normal files which references the mob files so we can update the references in the metadata? {quote} Yes, that is how it works. {{{quote}}} The mob files should have a different name prefix or under a different directory? {{{}{quote}{}}}{{{}{}}} They have a different directory structure. "/mobdir/data/default/table_name/a0209c070c85d1e4d500af8ba33c3c02/cf" They are stored fully separated. Please note these regions only contain mob files and are fully independent from the referencing regions. A single mob region could theoretically contain every MOB file in hbase regardless of where it is referenced from. Also their naming convention is different. For us the only important thing is that it ends with "_", so something like this: "0cc175b9c0f1b6a831c399e2697726612022050314ecf20b51674cd6bd647bfb2d88b1ff_b593e96e821ba6211d8a4b101a88" {quote} So at least for loading, there will be no problem {quote} That's true. Reads are very straightforward. {{{quote}}} I think the only problem here is how do we clean up the half written mob files, I think the logic is mainly the same with what we have now, get all the mob refs from all the normal storefiles, to construct the base list, and then get all the mob files which are currently being written, all MOB files besides them are the ones should be deleted. {quote}{{{}{}}} {{That is part of the problme, yes. To have access to the half written mob file list the cleaner have to run on the RS. But each RS only has access to it's own }}half written mob file list so each can only clean a subset of the existing mob files. To be precise if a mob file name ends with a region's name that is hosted on the current RS then the cleaner can decide if it can be archived or not. Unfortunately with merges and splits regions get archived so after a point there will be mob files containing names of regions not hosted on any RS and none of the cleaners running on RSes could clean these up. So we need one more cleaner specifically for these (I put it on master to replace the original cleaner), that have to read every available hfile to make sure we have every active mob reference and are able to decide if a mob file created by a since archived region can be archived or not. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536589#comment-17536589 ] Szabolcs Bukros edited comment on HBASE-26969 at 5/13/22 11:57 AM: --- [~zhangduo] thank you for taking time and reading this! As far as I understand when a storeFile has data that is stored in a mob file then that storeFile's metadata will have a reference to these mob files. So when a scan request tries to read the data it knows which mob file to check. This is the only tracking we have. {code:java} storeFile.getMetadataValue(HStoreFile.MOB_FILE_REFS); {code} For cleanup the chore have to know which mob files are currently actively referenced. To get this list, the chore check's the metadata of every single storeFile hbase have in a mob enabled CF, and collects the references from them. It just iterates through the /data folder table by table. was (Author: bszabolcs): [~zhangduo] thank you for taking time and reading this! As far as I understand when a storeFile has data that is stored in a mob file then that storeFile's metadata will have a reference to these mob files. So when a scan request tries to read the data it knows which mob file to check. This is the only tracking we have. {code:java} storeFile.getMetadataValue(HStoreFile.MOB_FILE_REFS); {code} For cleanup the chore have to know which mob files are currently actively referenced. To get this lit, the chore check's the metadata of every single storeFile hbase have in a mob enabled CF, and collects the references from them. It just iterates through the /data folder table by table. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536589#comment-17536589 ] Szabolcs Bukros commented on HBASE-26969: - [~zhangduo] thank you for taking time and reading this! As far as I understand when a storeFile has data that is stored in a mob file then that storeFile's metadata will have a reference to these mob files. So when a scan request tries to read the data it knows which mob file to check. This is the only tracking we have. {code:java} storeFile.getMetadataValue(HStoreFile.MOB_FILE_REFS); {code} For cleanup the chore have to know which mob files are currently actively referenced. To get this lit, the chore check's the metadata of every single storeFile hbase have in a mob enabled CF, and collects the references from them. It just iterates through the /data folder table by table. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Sub-task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534360#comment-17534360 ] Szabolcs Bukros commented on HBASE-26969: - I would like to start by stating that this issue grow bigger than just removing the renames and exposed multiple issues in the MOB-SFT interaction. I have uploaded a draft PR containing my changes. I intend to use it as a reference to show the issues when it comes to using MOB on FileBased SFT. My main problem was that while MOB files were already tracked in the hfile metadata, the "single source of truth" is widely distributed and not easily available. Both the WriterCreationTracker and the StoreFileTracker are RS based data and the MOB cleaner needs it to work reliably when FileBased SFT is used. Exposing this data and allowing the Master to request this from RSes, collect it and run the cleaner based on this, while technically possible, looked less than optimal. It would result in a single cluster wide spike that we should try to avoid and considering the delay that certain RSes could have (uneven load, GC pauses, etc) the data can be already outdated by the time the collection is done. So instead I tried to move the cleaner to the RSes. This solution also had it's drawbacks. MOB file names contain the encoded name of the region that created them so the RS hosting that specific region can check it's hfiles for references and can clean it up if it does not find anything. The problem comes with merge/split parent regions. When the parent region is archived the new region's hfiles will still hold references to the old MOB files but now the only way to make sure if the old MOB file is referenced or not is to check every single hfile in every store belonging to the same columnfamily, because we can not tell based on it's name where it could be referenced from. Like the old cleaner did. So while I moved the MOB cleaner to the RS level and reduced it's scope to only clean up MOB files belonging to regions hosted by that RS I had to leave a "global" MOB cleaner running on Master to deal with MOB files created by archived regions but potentially still being referenced. And I think this is very ugly. This whole process could have been significantly simpler if we would have tracker files in MOB stores but then we would have TWO competing sources of truth. The tracker files and the hfile metadata. HBASE-27017 is a related issue where the snapshot code tries to get the active MOB files based on the configured SFT, but since MOB stores do not have tracker files it returns an empty list. If the store had tracker files it would work. Without a tracker file we either include every MOB files in the dir (garbage included) or scan every single hfile metadata for MOB references. What I'm trying to say is that while I think my solution would work and solve the immediate issues I would much prefer if there would be a centralized, easily available active MOB list and create a solution based on that. [~apurtell] ,[~zhangduo],[~elserj] ,[~wchevreuil] What do you think? > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533854#comment-17533854 ] Szabolcs Bukros commented on HBASE-26969: - Thanks [~apurtell] ! > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533817#comment-17533817 ] Szabolcs Bukros commented on HBASE-26969: - [~apurtell] It looks like the MOB feature is currently incompatible with FileBased SFT. Without this issue fixed the currently written/temporary/outdated/trash files in the store dir can break the MobFileCleanerChore and the related issue shows that snapshotting a MOB enabled table while FileBased SFT is used results in dataloss. Since 2.5.0 being so close to release this fact should be documented somewhere. I'm planning to add a more detailed description of the issues I have encountered while trying to make these features work together as soon as I can publish a PR for reference. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used
[ https://issues.apache.org/jira/browse/HBASE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533807#comment-17533807 ] Szabolcs Bukros commented on HBASE-27017: - This issue was found while working on HBASE-26969. TestMobCompactionWithDefaults uses cloneSnapshot. > MOB snapshot is broken when FileBased SFT is used > - > > Key: HBASE-27017 > URL: https://issues.apache.org/jira/browse/HBASE-27017 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-2 >Reporter: Szabolcs Bukros >Priority: Major > > During snapshot MOB regions are treated like any other region. When a > snapshot is taken and hfile references are collected a StoreFileTracker is > created to get the current active hfile list. But the MOB region stores are > not tracked so an empty list is returned, resulting in a broken snapshot. > When this snapshot is cloned the resulting table will have no MOB files or > references. > The problematic code can be found here: > [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used
Szabolcs Bukros created HBASE-27017: --- Summary: MOB snapshot is broken when FileBased SFT is used Key: HBASE-27017 URL: https://issues.apache.org/jira/browse/HBASE-27017 Project: HBase Issue Type: Bug Components: mob Affects Versions: 3.0.0-alpha-2, 2.5.0 Reporter: Szabolcs Bukros During snapshot MOB regions are treated like any other region. When a snapshot is taken and hfile references are collected a StoreFileTracker is created to get the current active hfile list. But the MOB region stores are not tracked so an empty list is returned, resulting in a broken snapshot. When this snapshot is cloned the resulting table will have no MOB files or references. The problematic code can be found here: [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled
[ https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531420#comment-17531420 ] Szabolcs Bukros commented on HBASE-26969: - [~apurtell] Please bump it. > Eliminate MOB renames when SFT is enabled > - > > Key: HBASE-26969 > URL: https://issues.apache.org/jira/browse/HBASE-26969 > Project: HBase > Issue Type: Task > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3 > > > MOB file compaction and flush still relies on renames even when SFT is > enabled. > My proposed changes are: > * when requireWritingToTmpDirFirst is false during mob flush/compact instead > of using the temp writer we should create a different writer using a > {color:#00}StoreFileWriterCreationTracker that writes directly to the mob > store folder{color} > * {color:#00}these StoreFileWriterCreationTracker should be stored in > the MobStore. This would requires us to extend MobStore with a createWriter > and a finalizeWriter method to handle this{color} > * {color:#00}refactor {color}MobFileCleanerChore to run on the RS > instead on Master to allow access to the > {color:#00}StoreFileWriterCreationTracker{color}s to make sure the > currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-26969) Eliminate MOB renames when SFT is enabled
Szabolcs Bukros created HBASE-26969: --- Summary: Eliminate MOB renames when SFT is enabled Key: HBASE-26969 URL: https://issues.apache.org/jira/browse/HBASE-26969 Project: HBase Issue Type: Task Components: mob Affects Versions: 2.5.0, 3.0.0-alpha-3 Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros MOB file compaction and flush still relies on renames even when SFT is enabled. My proposed changes are: * when requireWritingToTmpDirFirst is false during mob flush/compact instead of using the temp writer we should create a different writer using a {color:#00}StoreFileWriterCreationTracker that writes directly to the mob store folder{color} * {color:#00}these StoreFileWriterCreationTracker should be stored in the MobStore. This would requires us to extend MobStore with a createWriter and a finalizeWriter method to handle this{color} * {color:#00}refactor {color}MobFileCleanerChore to run on the RS instead on Master to allow access to the {color:#00}StoreFileWriterCreationTracker{color}s to make sure the currently written files are not cleaned up -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-26791) Memstore flush fencing issue for SFT
Szabolcs Bukros created HBASE-26791: --- Summary: Memstore flush fencing issue for SFT Key: HBASE-26791 URL: https://issues.apache.org/jira/browse/HBASE-26791 Project: HBase Issue Type: Bug Affects Versions: 2.6.0, 3.0.0-alpha-3 Reporter: Szabolcs Bukros The scenarios is the following: # rs1 is flushing file to S3 for region1 # rs1 loses ZK lock # region1 gets assigned to rs2 # rs2 opens region1 # rs1 completes flush and updates sft file for region1 # rs2 has a different “version” of the sft file for region1 The flush should fail at the end, but the SFT file gets overwritten before that, resulting in potential data loss. Potential solutions include: * Adding timestamp to the tracker file names. This and creating a new tracker file when an rs open the region would allow us to list available tracker files before an update and compare the found timestamps to the one stored in memory to verify the store still owns the latest tracker file * Using the existing timestamp in the tracker file content. This would also require us to create a new tracker file when a new rs opens the region, but instead of listing the available tracker files, we could try to load and de-serialize the last tracker file and compare the timestamp found in it to the one stored in memory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26707) Reduce number of renames during bulkload
[ https://issues.apache.org/jira/browse/HBASE-26707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495985#comment-17495985 ] Szabolcs Bukros commented on HBASE-26707: - [~wchevreuil] Thanks a lot for your feedback and commit. Please find the branch-2 compatible PR here: https://github.com/apache/hbase/pull/4122 > Reduce number of renames during bulkload > > > Key: HBASE-26707 > URL: https://issues.apache.org/jira/browse/HBASE-26707 > Project: HBase > Issue Type: Sub-task >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > > Make sure we only do a single rename operation during bulkload when > StoreEngine does not require the the use of tmp directories. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26624) [hbase-operator-tools] Introduce a HBCK2 tool to fix the store file tracking
[ https://issues.apache.org/jira/browse/HBASE-26624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488135#comment-17488135 ] Szabolcs Bukros commented on HBASE-26624: - Hi [~zhangduo] Do you expect this to be part of HBCK2 or a standalone operator tool like RegionsMerger would be sufficient? Also what kind of granularity are you looking for? Having a tool that could re-generate the tracker files globally or for a selected table would be enough or should we go down to the region level? > [hbase-operator-tools] Introduce a HBCK2 tool to fix the store file tracking > > > Key: HBASE-26624 > URL: https://issues.apache.org/jira/browse/HBASE-26624 > Project: HBase > Issue Type: Sub-task > Components: hbase-operator-tools, hbck2 >Reporter: Duo Zhang >Assignee: Szabolcs Bukros >Priority: Major > > We should provide a HBCK2 tool to recover the store file tracking if it is > broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HBASE-26624) [hbase-operator-tools] Introduce a HBCK2 tool to fix the store file tracking
[ https://issues.apache.org/jira/browse/HBASE-26624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros reassigned HBASE-26624: --- Assignee: Szabolcs Bukros > [hbase-operator-tools] Introduce a HBCK2 tool to fix the store file tracking > > > Key: HBASE-26624 > URL: https://issues.apache.org/jira/browse/HBASE-26624 > Project: HBase > Issue Type: Sub-task > Components: hbase-operator-tools, hbck2 >Reporter: Duo Zhang >Assignee: Szabolcs Bukros >Priority: Major > > We should provide a HBCK2 tool to recover the store file tracking if it is > broken. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (HBASE-26707) Reduce number of renames during bulkload
[ https://issues.apache.org/jira/browse/HBASE-26707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-26707 started by Szabolcs Bukros. --- > Reduce number of renames during bulkload > > > Key: HBASE-26707 > URL: https://issues.apache.org/jira/browse/HBASE-26707 > Project: HBase > Issue Type: Sub-task >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > > Make sure we only do a single rename operation during bulkload when > StoreEngine does not require the the use of tmp directories. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26707) Reduce number of renames during bulkload
[ https://issues.apache.org/jira/browse/HBASE-26707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482502#comment-17482502 ] Szabolcs Bukros commented on HBASE-26707: - During implementation I have found an issue with bulkLoadListener.failedBulkLoad at [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L7320] The passed param is the staged path, but the method expects the file's original location. This could lead to leaving the hfile in the staging dir after failing a bulkload and because cleanup deletes staging loosing the hfile. This is also fixed in the attached PR. > Reduce number of renames during bulkload > > > Key: HBASE-26707 > URL: https://issues.apache.org/jira/browse/HBASE-26707 > Project: HBase > Issue Type: Sub-task >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > > Make sure we only do a single rename operation during bulkload when > StoreEngine does not require the the use of tmp directories. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26707) Reduce number of renames during bulkload
Szabolcs Bukros created HBASE-26707: --- Summary: Reduce number of renames during bulkload Key: HBASE-26707 URL: https://issues.apache.org/jira/browse/HBASE-26707 Project: HBase Issue Type: Sub-task Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros Make sure we only do a single rename operation during bulkload when StoreEngine does not require the the use of tmp directories. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26441) Add metrics for BrokenStoreFileCleaner
[ https://issues.apache.org/jira/browse/HBASE-26441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441874#comment-17441874 ] Szabolcs Bukros commented on HBASE-26441: - [~zhangduo] I would like to go back to my original Cleaner Chore PR and re-use the metrics solution from there and match it to the finalized chore. > Add metrics for BrokenStoreFileCleaner > -- > > Key: HBASE-26441 > URL: https://issues.apache.org/jira/browse/HBASE-26441 > Project: HBase > Issue Type: Sub-task > Components: metrics >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Minor > > This is a followup for HBASE-26271. > Cleaner chores lacking visibility is returning issue so I would like to add > metrics for BrokenStoreFileCleaner to have a better idea of the tasks it > performs. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26441) Add metrics for BrokenStoreFileCleaner
Szabolcs Bukros created HBASE-26441: --- Summary: Add metrics for BrokenStoreFileCleaner Key: HBASE-26441 URL: https://issues.apache.org/jira/browse/HBASE-26441 Project: HBase Issue Type: Sub-task Components: metrics Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros This is a followup for HBASE-26271. Cleaner chores lacking visibility is returning issue so I would like to add metrics for BrokenStoreFileCleaner to have a better idea of the tasks it performs. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26271) Cleanup the broken store files under data directory
[ https://issues.apache.org/jira/browse/HBASE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441867#comment-17441867 ] Szabolcs Bukros commented on HBASE-26271: - [~zhangduo] Thanks for all the feedback and merging it. > Cleanup the broken store files under data directory > --- > > Key: HBASE-26271 > URL: https://issues.apache.org/jira/browse/HBASE-26271 > Project: HBase > Issue Type: Sub-task > Components: HFile >Reporter: Duo Zhang >Assignee: Szabolcs Bukros >Priority: Major > Fix For: HBASE-26067 > > > As for some new store file tracker implementation, we allow flush/compaction > to write directly to data directory, so if we crash in the middle, there will > be broken store files left in the data directory. > We should find a proper way to delete these broken files. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot
[ https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440521#comment-17440521 ] Szabolcs Bukros edited comment on HBASE-26286 at 11/8/21, 2:52 PM: --- [~zhangduo] {quote} IIRC, our decision on HBASE-26280 is that, a snapshot will be constructed by plain HFiles, you always need to list the directory to get all the HFiles, so I'm a bit confusing that why here we say 'snapshot with file based SFT'. Did I miss something? {quote} If I understand correctly that discussion was about tracker files and concluded in not adding them because the list of available hfiles in the snapshot will always be the full and correct list of storefiles so the tracker files can be rebuilt if necessary. The TableDescriptor however can contain SFT config, both on table and cf level if the global SFT config was overridden. {quote} Could someone explain them for me? It seems that one of them is for creating a new table and another one is for performing on an existing table? {quote} Clone creates a new table with the provided name and the TableDescriptor from the snapshot metadata. So we can freely change the SFT implementation we would like to use, because we can just override the TableDescriptor and the new table will be created with it. Restore, tries to restore the state of an existing table to match the snapshot. To achieve this it deletes regions and/or hfiles present in the current table but not present in the snapshot, copies regions and/or hfiles missing to the current table from the snapshot and most importantly for us it simply overwrites the current TableDesriptor with the one from the snapshot. This last step causes the problems. * Consider a usecase where a cf uses file based SFT at the time of snapshot, while the global config is still the default SFT. Later on we migrate the cf back to the default SFT. Then we have to restore the snapshot. The process overwrites the TableDescriptor with the one from snapshot and suddenly the cf will try to use file based SFT (since it used that before the snapshot) but because there is no actual SFT migration as part of the restore process the cf folder does not have tracking files and SFT fails. This is a bug in the current implementation. * Specifying the SFT for restore has it's own issues. Consider a usecase where the global SFT config uses the default. We restore a table and specify we would like to use file based SFT instead. There will be regions that exists in the current table and existed at the time of the snapshot too. A few hfiles might get added/deleted, but otherwise they remain untouched. Forcefully setting the SFT to file based as specified is possible, but there is no logic that would do the migration and build the tracker files, so the SFT would fail. Similarly switching back to default (from a file based SFT) is possible but restore lacks the logic to clean up the tracker files. We have multiple options here: # As [~wchevreuil] suggested we could add a check that stops the restore process if there would be an SFT incompatibility and would prompt the user to manually migrate the problematic sections first. This has the advantage of keeping the restore logic clean and making an SFT change a more conscious decision. But has the downside of being a potentially labor intensive manual process. # We could use the SFT implementation param we are currently introducing to signal which implementation we would *prefer* to use. When there is a conflict in the current and snapshot SFT config, if the currently used implementation matches the SFT param, we can override the snapshot config. This is basically a bit more flexible variation of the 1. point. It would help the user move towards a selected SFT while keeping the restore logic clean. # We could add the SFT migration logic to restore and simply add the tracking files when needed or clean them up when we move away from file based SFT. It has the upside of being the most user friendly solution, but it has the downside of mixing restore logic with SFT logic. # We could extend the SFT implementations to "auto migrate" meaning clean up after themselves and prepare necessary files for themselves. This would allow restore to just override the TableDescriptor any way it wants and let SFT deal with the required steps. was (Author: bszabolcs): [~zhangduo] {quote} IIRC, our decision on HBASE-26280 is that, a snapshot will be constructed by plain HFiles, you always need to list the directory to get all the HFiles, so I'm a bit confusing that why here we say 'snapshot with file based SFT'. Did I miss something?\{quote} If I understand correctly that discussion was about tracker files and concluded in not adding them because the list of available hfiles in the snapshot will always be the full and correct list of st
[jira] [Commented] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot
[ https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440521#comment-17440521 ] Szabolcs Bukros commented on HBASE-26286: - [~zhangduo] {quote} IIRC, our decision on HBASE-26280 is that, a snapshot will be constructed by plain HFiles, you always need to list the directory to get all the HFiles, so I'm a bit confusing that why here we say 'snapshot with file based SFT'. Did I miss something?\{quote} If I understand correctly that discussion was about tracker files and concluded in not adding them because the list of available hfiles in the snapshot will always be the full and correct list of storefiles so the tracker files can be rebuilt if necessary. The TableDescriptor however can contain SFT config, both on table and cf level if the global SFT config was overridden. {quote} Could someone explain them for me? It seems that one of them is for creating a new table and another one is for performing on an existing table? {quote} Clone creates a new table with the provided name and the TableDescriptor from the snapshot metadata. So we can freely change the SFT implementation we would like to use, because we can just override the TableDescriptor and the new table will be created with it. Restore, tries to restore the state of an existing table to match the snapshot. To achieve this it deletes regions and/or hfiles present in the current table but not present in the snapshot, copies regions and/or hfiles missing to the current table from the snapshot and most importantly for us it simply overwrites the current TableDesriptor with the one from the snapshot. This last step causes the problems. * Consider a usecase where a cf uses file based SFT at the time of snapshot, while the global config is still the default SFT. Later on we migrate the cf back to the default SFT. Then we have to restore the snapshot. The process overwrites the TableDescriptor with the one from snapshot and suddenly the cf will try to use file based SFT (since it used that before the snapshot) but because there is no actual SFT migration as part of the restore process the cf folder does not have tracking files and SFT fails. This is a bug in the current implementation. * Specifying the SFT for restore has it's own issues. Consider a usecase where the global SFT config uses the default. We restore a table and specify we would like to use file based SFT instead. There will be regions that exists in the current table and existed at the time of the snapshot too. A few hfiles might get added/deleted, but otherwise they remain untouched. Forcefully setting the SFT to file based as specified is possible, but there is no logic that would do the migration and build the tracker files, so the SFT would fail. Similarly switching back to default (from a file based SFT) is possible but restore lacks the logic to clean up the tracker files. We have multiple options here: # As [~wchevreuil] suggested we could add a check that stops the restore process if there would be an SFT incompatibility and would prompt the user to manually migrate the problematic sections first. This has the advantage of keeping the restore logic clean and making an SFT change a more conscious decision. But has the downside of being a potentially labor intensive manual process. # We could use the SFT implementation param we are currently introducing to signal which implementation we would *prefer* to use. When there is a conflict in the current and snapshot SFT config, if the currently used implementation matches the SFT param, we can override the snapshot config. This is basically a bit more flexible variation of the 1. point. It would help the user move towards a selected SFT while keeping the restore logic clean. # We could add the SFT migration logic to restore and simply add the tracking files when needed or clean them up when we move away from file based SFT. It has the upside of being the most user friendly solution, but it has the downside of mixing restore logic with SFT logic. # We could extend the SFT implementations to "auto migrate" meaning clean up after themselves and prepare necessary files for themselves. This would allow restore to just override the TableDescriptor any way it wants and let SFT deal with the required steps. > Add support for specifying store file tracker when restoring or cloning > snapshot > > > Key: HBASE-26286 > URL: https://issues.apache.org/jira/browse/HBASE-26286 > Project: HBase > Issue Type: Sub-task > Components: HFile, snapshots >Reporter: Duo Zhang >Assignee: Szabolcs Bukros >Priority: Major > > As discussed in HBASE-26280. > https://issues.apache.org/jira/browse/HBA
[jira] [Work started] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot
[ https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-26286 started by Szabolcs Bukros. --- > Add support for specifying store file tracker when restoring or cloning > snapshot > > > Key: HBASE-26286 > URL: https://issues.apache.org/jira/browse/HBASE-26286 > Project: HBase > Issue Type: Sub-task > Components: HFile, snapshots >Reporter: Duo Zhang >Assignee: Szabolcs Bukros >Priority: Major > > As discussed in HBASE-26280. > https://issues.apache.org/jira/browse/HBASE-26280?focusedCommentId=17414894&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17414894 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-26271) Cleanup the broken store files under data directory
[ https://issues.apache.org/jira/browse/HBASE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-26271 started by Szabolcs Bukros. --- > Cleanup the broken store files under data directory > --- > > Key: HBASE-26271 > URL: https://issues.apache.org/jira/browse/HBASE-26271 > Project: HBase > Issue Type: Sub-task > Components: HFile >Reporter: Duo Zhang >Assignee: Szabolcs Bukros >Priority: Major > > As for some new store file tracker implementation, we allow flush/compaction > to write directly to data directory, so if we crash in the middle, there will > be broken store files left in the data directory. > We should find a proper way to delete these broken files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot
[ https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436018#comment-17436018 ] Szabolcs Bukros commented on HBASE-26286: - [~wchevreuil] Great point. I'll have to check how the different SFTs would handle this. > Add support for specifying store file tracker when restoring or cloning > snapshot > > > Key: HBASE-26286 > URL: https://issues.apache.org/jira/browse/HBASE-26286 > Project: HBase > Issue Type: Sub-task > Components: HFile, snapshots >Reporter: Duo Zhang >Assignee: Szabolcs Bukros >Priority: Major > > As discussed in HBASE-26280. > https://issues.apache.org/jira/browse/HBASE-26280?focusedCommentId=17414894&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17414894 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot
[ https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435979#comment-17435979 ] Szabolcs Bukros commented on HBASE-26286: - [~zhangduo], [~wchevreuil], [~elserj] After checking the code, making *cloning* SFT configurable looks straightforward enough. We can freely overwrite the Descriptors and use the new SFT impl table wide. However the same is not true for *restore*. The store configuration StoreEngine and SFT impl is based on is a composite of 3 sources: master conf, TableDescriptor, ColumnFamilyDescriptor. We can not change any of these without potentially affecting otherwise untouched stores and my assumption is that we should avoid that. My suggestion would be to drop restore from the scope. Because if changing otherwise untouched regions should be avoided than our Descriptor granularity is insufficient for this task. If changing regions untouched by restore is acceptable, I would argue doing a traditional restore and using the already existing migration logic is a cleaner solution than mixing it with snapshot restore. Am I missing something? What do you think? > Add support for specifying store file tracker when restoring or cloning > snapshot > > > Key: HBASE-26286 > URL: https://issues.apache.org/jira/browse/HBASE-26286 > Project: HBase > Issue Type: Sub-task > Components: HFile, snapshots >Reporter: Duo Zhang >Assignee: Szabolcs Bukros >Priority: Major > > As discussed in HBASE-26280. > https://issues.apache.org/jira/browse/HBASE-26280?focusedCommentId=17414894&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17414894 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26271) Cleanup the broken store files under data directory
[ https://issues.apache.org/jira/browse/HBASE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428394#comment-17428394 ] Szabolcs Bukros commented on HBASE-26271: - [~zhangduo] prepared a new PR, on the correct branch this time. Also added metrics for the chore and a REST endpoint to easily access those metrics. Could you please take a look? [Please find it here.|https://github.com/apache/hbase/pull/3751] > Cleanup the broken store files under data directory > --- > > Key: HBASE-26271 > URL: https://issues.apache.org/jira/browse/HBASE-26271 > Project: HBase > Issue Type: Sub-task > Components: HFile >Reporter: Duo Zhang >Assignee: Szabolcs Bukros >Priority: Major > > As for some new store file tracker implementation, we allow flush/compaction > to write directly to data directory, so if we crash in the middle, there will > be broken store files left in the data directory. > We should find a proper way to delete these broken files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26271) Cleanup the broken store files under data directory
[ https://issues.apache.org/jira/browse/HBASE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422202#comment-17422202 ] Szabolcs Bukros commented on HBASE-26271: - [~zhangduo] Created a PR with my initial changes. Could you please take a look? [~wchevreuil] suggested that if we could list the currently written compaction targets we could add that to the check and should not solely rely on the leftover file ttl to prevent breaking a long-running compaction. I have not found a nice way to add a generic implementation for this so my initial solution is limited to the {color:#00}DirectStoreCompactor. What do you think? {color} {color:#00}I'm planning to add metrics to get a clearer idea of chore performance/results and extend the api to get this info in a followup commit.{color} > Cleanup the broken store files under data directory > --- > > Key: HBASE-26271 > URL: https://issues.apache.org/jira/browse/HBASE-26271 > Project: HBase > Issue Type: Sub-task > Components: HFile >Reporter: Duo Zhang >Priority: Major > > As for some new store file tracker implementation, we allow flush/compaction > to write directly to data directory, so if we crash in the middle, there will > be broken store files left in the data directory. > We should find a proper way to delete these broken files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26271) Cleanup the broken store files under data directory
[ https://issues.apache.org/jira/browse/HBASE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418169#comment-17418169 ] Szabolcs Bukros commented on HBASE-26271: - [~zhangduo], [~elserj] 1. The approach I'm testing now would be to add a ScheduledChore on region start that periodically checks the hfiles for each store where persistent storage is enabled. Maybe allowing this to be called from shell too. I think it is safer this way. We can run it rarely enough to minimize the performance impact but make sure to keep the folder clean. 2. I would check the ModificationTime and add a massive waiting period. The period can be big enough to be safely outside a realistic compaction runtime, since we are in no hurry to archive these files. I'm not sure a more complicated solution is warranted. " we could still fail before inserting these files into store file tracker right?" My thoughts exactly. The safest solution seems to be just listing the file system. Also considering this is rs specific and we can add some jitter the impact should not be significant either. What do you think? > Cleanup the broken store files under data directory > --- > > Key: HBASE-26271 > URL: https://issues.apache.org/jira/browse/HBASE-26271 > Project: HBase > Issue Type: Sub-task > Components: HFile >Reporter: Duo Zhang >Priority: Major > > As for some new store file tracker implementation, we allow flush/compaction > to write directly to data directory, so if we crash in the middle, there will > be broken store files left in the data directory. > We should find a proper way to delete these broken files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-25394) Support Snapshot related operation with direct insert HFiles into data/CF directory
[ https://issues.apache.org/jira/browse/HBASE-25394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros reassigned HBASE-25394: --- Assignee: Szabolcs Bukros > Support Snapshot related operation with direct insert HFiles into data/CF > directory > --- > > Key: HBASE-25394 > URL: https://issues.apache.org/jira/browse/HBASE-25394 > Project: HBase > Issue Type: Sub-task >Reporter: Tak-Lon (Stephen) Wu >Assignee: Szabolcs Bukros >Priority: Major > > {color:#00}Support restore snapshot, clone snapshot with direct insert > into data directory{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25964) [HBOSS] Introducing hbase metrics to Hboss
Szabolcs Bukros created HBASE-25964: --- Summary: [HBOSS] Introducing hbase metrics to Hboss Key: HBASE-25964 URL: https://issues.apache.org/jira/browse/HBASE-25964 Project: HBase Issue Type: Improvement Components: hboss Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros Fix For: hbase-filesystem-1.0.0-alpha2 I would like to introduce hbase metrics to Hboss to allow closer monitoring of rename performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24720) Meta replicas not cleaned when disabled
[ https://issues.apache.org/jira/browse/HBASE-24720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157570#comment-17157570 ] Szabolcs Bukros commented on HBASE-24720: - Thanks for the merge and review [~psomogyi] ! > Meta replicas not cleaned when disabled > --- > > Key: HBASE-24720 > URL: https://issues.apache.org/jira/browse/HBASE-24720 > Project: HBase > Issue Type: Bug > Components: read replicas >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.6 > > > The assignMetaReplicas method works kinda like this: > {code:java} > void assignMetaReplicas(){ > if (numReplicas <= 1) return; > //create if needed then assign meta replicas > unassignExcessMetaReplica(numReplicas); > } > {code} > Now this unassignExcessMetaReplica method is the one that gets rid of the > replicas we no longer need. It closes them and deletes their zNode. > Unfortunately this only happens if we decreased the replica number. If we > disabled it, by setting the replica number to 1 assignMetaReplicas returns > instantly without cleaning up the no longer needed replicas resulting in > replicas lingering around. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24720) Meta replicas not cleaned when disabled
Szabolcs Bukros created HBASE-24720: --- Summary: Meta replicas not cleaned when disabled Key: HBASE-24720 URL: https://issues.apache.org/jira/browse/HBASE-24720 Project: HBase Issue Type: Bug Components: read replicas Affects Versions: 2.2.5, 3.0.0-alpha-1, 2.3.0, 2.4.0 Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros The assignMetaReplicas method works kinda like this: {code:java} void assignMetaReplicas(){ if (numReplicas <= 1) return; //create if needed then assign meta replicas unassignExcessMetaReplica(numReplicas); } {code} Now this unassignExcessMetaReplica method is the one that gets rid of the replicas we no longer need. It closes them and deletes their zNode. Unfortunately this only happens if we decreased the replica number. If we disabled it, by setting the replica number to 1 assignMetaReplicas returns instantly without cleaning up the no longer needed replicas resulting in replicas lingering around. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24562) Stabilize master startup with meta replicas enabled
[ https://issues.apache.org/jira/browse/HBASE-24562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147919#comment-17147919 ] Szabolcs Bukros commented on HBASE-24562: - Thanks for the merges [~wchevreuil] ! Please find the branch-2.2 compatible PR here: https://github.com/apache/hbase/pull/1997 > Stabilize master startup with meta replicas enabled > --- > > Key: HBASE-24562 > URL: https://issues.apache.org/jira/browse/HBASE-24562 > Project: HBase > Issue Type: Improvement > Components: meta, read replicas >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0 > > > This is related to HBASE-21624 . > I created a separate ticket because in the original one a "complete solution > for meta replicas" was requested and this is not one. I'm just trying to make > master startup more stable by making assigning meta replicas asynchronous and > preventing a potential assignment failure from crashing master. > The idea is that starting master with less or even no meta replicas assigned > is preferable to not having a running master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24579) Failed SASL authentication does not result in an exception on client side
[ https://issues.apache.org/jira/browse/HBASE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141823#comment-17141823 ] Szabolcs Bukros commented on HBASE-24579: - Thanks a lot for the commits [~wchevreuil] ! Please find the branch-2.2 PR here: https://github.com/apache/hbase/pull/1951 > Failed SASL authentication does not result in an exception on client side > - > > Key: HBASE-24579 > URL: https://issues.apache.org/jira/browse/HBASE-24579 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0 > > > When HBaseSaslRpcClient.saslConnect tries to authenticate it only reads the > input stream if the process is not complete yet. However if the > authentication failed and the process is completed the exception sent back in > the stream never gets read. > We should always try to read the input stream even if the process is complete > to make sure it was empty. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-24562) Stabilize master startup with meta replicas enabled
[ https://issues.apache.org/jira/browse/HBASE-24562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-24562 started by Szabolcs Bukros. --- > Stabilize master startup with meta replicas enabled > --- > > Key: HBASE-24562 > URL: https://issues.apache.org/jira/browse/HBASE-24562 > Project: HBase > Issue Type: Improvement > Components: meta, read replicas >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > > This is related to HBASE-21624 . > I created a separate ticket because in the original one a "complete solution > for meta replicas" was requested and this is not one. I'm just trying to make > master startup more stable by making assigning meta replicas asynchronous and > preventing a potential assignment failure from crashing master. > The idea is that starting master with less or even no meta replicas assigned > is preferable to not having a running master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-24579) Failed SASL authentication does not result in an exception on client side
[ https://issues.apache.org/jira/browse/HBASE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-24579 started by Szabolcs Bukros. --- > Failed SASL authentication does not result in an exception on client side > - > > Key: HBASE-24579 > URL: https://issues.apache.org/jira/browse/HBASE-24579 > Project: HBase > Issue Type: Bug > Components: rpc >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > > When HBaseSaslRpcClient.saslConnect tries to authenticate it only reads the > input stream if the process is not complete yet. However if the > authentication failed and the process is completed the exception sent back in > the stream never gets read. > We should always try to read the input stream even if the process is complete > to make sure it was empty. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24579) Failed SASL authentication does not result in an exception on client side
Szabolcs Bukros created HBASE-24579: --- Summary: Failed SASL authentication does not result in an exception on client side Key: HBASE-24579 URL: https://issues.apache.org/jira/browse/HBASE-24579 Project: HBase Issue Type: Bug Components: rpc Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros When HBaseSaslRpcClient.saslConnect tries to authenticate it only reads the input stream if the process is not complete yet. However if the authentication failed and the process is completed the exception sent back in the stream never gets read. We should always try to read the input stream even if the process is complete to make sure it was empty. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24562) Stabilize master startup with meta replicas enabled
Szabolcs Bukros created HBASE-24562: --- Summary: Stabilize master startup with meta replicas enabled Key: HBASE-24562 URL: https://issues.apache.org/jira/browse/HBASE-24562 Project: HBase Issue Type: Improvement Components: meta, read replicas Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros This is related to HBASE-21624 . I created a separate ticket because in the original one a "complete solution for meta replicas" was requested and this is not one. I'm just trying to make master startup more stable by making assigning meta replicas asynchronous and preventing a potential assignment failure from crashing master. The idea is that starting master with less or even no meta replicas assigned is preferable to not having a running master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24186) RegionMover ignores replicationId
[ https://issues.apache.org/jira/browse/HBASE-24186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110949#comment-17110949 ] Szabolcs Bukros commented on HBASE-24186: - My change relies on HBASE-21753 and it was not backported to branch-2.1 . Thanks [~ram_krish] for the revert. > RegionMover ignores replicationId > - > > Key: HBASE-24186 > URL: https://issues.apache.org/jira/browse/HBASE-24186 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.5 > > > When RegionMover looks up which rs hosts a region, it does this based on > startRowKey. When read replication is enabled this might not return the > expected region's data and this can prevent the moving of these regions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24186) RegionMover ignores replicationId
[ https://issues.apache.org/jira/browse/HBASE-24186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084226#comment-17084226 ] Szabolcs Bukros commented on HBASE-24186: - Correction: This does not prevent the moving of the region, the result of getServerNameForRegion() is only used in the validation of the move, so it only forces the move to try to repeat itself because it does not realize the move already happened. So it just slows down the process but not break it. > RegionMover ignores replicationId > - > > Key: HBASE-24186 > URL: https://issues.apache.org/jira/browse/HBASE-24186 > Project: HBase > Issue Type: Bug > Components: read replicas >Affects Versions: master >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Minor > > When RegionMover looks up which rs hosts a region, it does this based on > startRowKey. When read replication is enabled this might not return the > expected region's data and this can prevent the moving of these regions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24186) RegionMover ignores replicationId
Szabolcs Bukros created HBASE-24186: --- Summary: RegionMover ignores replicationId Key: HBASE-24186 URL: https://issues.apache.org/jira/browse/HBASE-24186 Project: HBase Issue Type: Bug Components: read replicas Affects Versions: master Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros When RegionMover looks up which rs hosts a region, it does this based on startRowKey. When read replication is enabled this might not return the expected region's data and this can prevent the moving of these regions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot
[ https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067867#comment-17067867 ] Szabolcs Bukros commented on HBASE-23995: - The logs are from 2.0. I'm reasonably certain. I see in the 2.0 logs that the manifest is created while compaction is running and before it could have finished writing to the temporary hfile. Thanks to this the manifest would refer to the hfile references. While in 2.2 where the snapshot runs after the compaction, it refers tot he freshly created storefiles. > Snapshoting a splitting region results in corrupted snapshot > > > Key: HBASE-23995 > URL: https://issues.apache.org/jira/browse/HBASE-23995 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.2 >Reporter: Szabolcs Bukros >Priority: Major > > The problem seems to originate from the fact that while the region split > itself runs in a lock, the compactions following it run in separate threads. > Alternatively the use of space quota policies can prevent compaction after a > split and leads to the same issue. > In both cases the resulting snapshot will keep the split status of the parent > region, but do not keep the references to the daughter regions, because they > (splitA, splitB qualifiers) are stored separately in the meta table and do > not propagate with the snapshot. > This is important because the in the freshly cloned table CatalogJanitor will > find the parent region, realizes it is in split state, but because it can not > find the daughter region references (haven't propagated) assumes parent could > be cleaned up and deletes it. The archived region used in the snaphost only > has back reference to the now also archived parent region and if the snapshot > is deleted they both gets cleaned up. Unfortunately the daughter regions only > contains hfile links, so at this point the data is lost. > How to reproduce: > {code:java} > hbase shell < create 'test', 'cf' > (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"} > flush 'test' > split 'test' > snapshot 'test', 'testshot' > EOF > {code} > This should make sure the snapshot is made before the compaction could be > finished even with small amount of data. > {code:java} > sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > testshot -copy-to hdfs://target:8020/apps/hbase/data/ > {code} > I export the snapshot to make the usecase cleaner but deleting both the > snapshot and the original table after the cloning should have the same effect. > {code:java} > clone_snapshot 'testshot', 'test2' > delete_snapshot "testshot" > {code} > I'm not sure what would be the best way to fix this. Preventing snapshots > when a region is in split state, would make snapshot creation problematic. > Forcing to run compaction as part of the split thread would make it rather > slow. Propagating the daughter region references could prevent the deletion > of the cloned parent region and the data would not be broken anymore but I'm > not sure we have a logic in place that could pick up the pieces and finish > the split process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot
[ https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067784#comment-17067784 ] Szabolcs Bukros commented on HBASE-23995: - As Josh mentioned both Split and Snapshot uses PV2 so it should work. And since in 2.2 it does work I started to check commits missing from the old branch. HBASE-21375 looked promising, while it does not target this behavior it looked like a general improvement on the locking logic. Quickly backported and re-tested it, but unfortunately it does not solve the issue. Now that I know what to look for I could find in the log the point where the lock is passed from Split to Snapshot (hbase-master.log). {code:java} 2020-03-26 14:32:31,945 INFO [PEWorker-8] procedure2.ProcedureExecutor: Finished pid=28, state=SUCCESS; SplitTableRegionProcedure table=tab2, parent=11544264d3485f5ff700562ca6b62acb, daughterA =dcf89acf08c55f494fd93ceedd3f3445, daughterB=bf84f2e23131d9488d9c56117d374187 in 1.0010sec 2020-03-26 14:32:31,946 DEBUG [PEWorker-8] locking.LockProcedure: LOCKED pid=30, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, tableName=tab2, type=EXCLUSIVE 2020-03-26 14:32:31,948 INFO [PEWorker-8] procedure2.TimeoutExecutorThread: ADDED pid=30, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.master.locking.LockProcedure, tableName=ta b2, type=EXCLUSIVE; timeout=60, timestamp=1585233751948 2020-03-26 14:32:31,948 DEBUG [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] snapshot.SnapshotManager: Started snapshot: { ss=tabshot2 table=tab2 type=FLUSH } {code} Curiously in the rs log I can see PostOpenDeployTasks and compactions starting to run while SplitTableRegionProcedure has the lock {code:java} 2020-03-26 14:32:31,918 INFO [PostOpenDeployTasks:dcf89acf08c55f494fd93ceedd3f3445] regionserver.HRegionServer: Post open deploy tasks for tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445. 2020-03-26 14:32:31,919 DEBUG [PostOpenDeployTasks:dcf89acf08c55f494fd93ceedd3f3445] regionserver.CompactSplit: Small Compaction requested: system; Because: Opening Region; compactionQueue=(longCompactions=0:shortCompactions=0), splitQueue=0 2020-03-26 14:32:31,921 DEBUG [regionserver/c2504-node4:16020-longCompactions-1585218367783] compactions.SortedCompactionPolicy: Selecting compaction from 1 store files, 0 compacting, 1 eligible, 100 blocking 2020-03-26 14:32:31,922 DEBUG [regionserver/c2504-node4:16020-longCompactions-1585218367783] regionserver.HStore: dcf89acf08c55f494fd93ceedd3f3445 - cf: Initiating minor compaction (all files) {code} And it only finishes at around the same time snapshot is finishing: {code:java} 2020-03-26 14:32:32,088 INFO [regionserver/c2504-node4:16020-longCompactions-1585218367783] regionserver.CompactSplit: Completed compaction region=tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445., storeName=cf, priority=99, startTime=1585233151918; duration=0sec 2020-03-26 14:32:32,091 DEBUG [regionserver/c2504-node4:16020-longCompactions-1585218367783] regionserver.CompactSplit: Status compactionQueue=(longCompactions=0:shortCompactions=0), splitQueue=0233150936.bf84f2e23131d9488d9c56117d374187. 2020-03-26 14:32:32,101 DEBUG [rs(c2504-node4.coelab.cloudera.com,16020,1585218362034)-snapshot-pool6-thread-1] snapshot.FlushSnapshotSubprocedure: ... Flush Snapshotting region tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445. completed. 2020-03-26 14:32:32,101 DEBUG [rs(c2504-node4.coelab.cloudera.com,16020,1585218362034)-snapshot-pool6-thread-1] snapshot.FlushSnapshotSubprocedure: Closing snapshot operation on tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445. 2020-03-26 14:32:32,102 DEBUG [member: 'c2504-node4.coelab.cloudera.com,16020,1585218362034' subprocedure-pool2-thread-1] snapshot.RegionServerSnapshotManager: Completed 1/2 local region snapshots. 2020-03-26 14:32:32,102 DEBUG [member: 'c2504-node4.coelab.cloudera.com,16020,1585218362034' subprocedure-pool2-thread-1] snapshot.RegionServerSnapshotManager: Completed 2/2 local region snapshots. {code} > Snapshoting a splitting region results in corrupted snapshot > > > Key: HBASE-23995 > URL: https://issues.apache.org/jira/browse/HBASE-23995 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.2 >Reporter: Szabolcs Bukros >Priority: Major > > The problem seems to originate from the fact that while the region split > itself runs in a lock, the compactions following it run in separate threads. > Alternatively the use of space quota policies can prevent compaction after a > split and leads to the same issue. > In both cases the resulting snapshot will keep the split status of the parent > region, but do not keep t
[jira] [Comment Edited] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot
[ https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060757#comment-17060757 ] Szabolcs Bukros edited comment on HBASE-23995 at 3/17/20, 9:19 AM: --- After the split the daughter regions only have hfile links to the storefile in the parent. Even the CatalogJanitor leaves these parents alone, only deleting it after the compaction is done and are no longer referenced from daughters. The title might not be precise or well choosen. What I wanted to say is that snapshoting a state where the split was done, but compaction was not (this is what I clumsily called "splitting") results in a structure where the daughter regions has no data just links to a parent is saved and can be exported. However not every necessary info is exported with it (daughter references from parent are missing) and this leads to an issue where in the cloned table the parent region, that actually contains the data is archived then deleted in minutes after the cloning is done, resulting in loosing the exported data. was (Author: bszabolcs): After the split the daughter regions only have hfile links to the storefile in the parent. Even the CatalogJanitor leaves these parents alone, only deleting it after the compaction is done and are no longer referenced from daughters. The title might not be precise or well choosen. What I wanted to say is that snapshoting a state where the split was done, but compaction was not (this is what I clumsily called "splitting") results in a structure where the daughter regions has no data just links to a parent is saved and can be exported. However not every necessary info is exported with it (daughter references from parent are missing) and this leads to an issue where in the cloned table the parent, that actually contains the data is archived then deleted in minutes after the cloning is done, resulting in loosing the exported data. > Snapshoting a splitting region results in corrupted snapshot > > > Key: HBASE-23995 > URL: https://issues.apache.org/jira/browse/HBASE-23995 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.2 >Reporter: Szabolcs Bukros >Priority: Major > > The problem seems to originate from the fact that while the region split > itself runs in a lock, the compactions following it run in separate threads. > Alternatively the use of space quota policies can prevent compaction after a > split and leads to the same issue. > In both cases the resulting snapshot will keep the split status of the parent > region, but do not keep the references to the daughter regions, because they > (splitA, splitB qualifiers) are stored separately in the meta table and do > not propagate with the snapshot. > This is important because the in the freshly cloned table CatalogJanitor will > find the parent region, realizes it is in split state, but because it can not > find the daughter region references (haven't propagated) assumes parent could > be cleaned up and deletes it. The archived region used in the snaphost only > has back reference to the now also archived parent region and if the snapshot > is deleted they both gets cleaned up. Unfortunately the daughter regions only > contains hfile links, so at this point the data is lost. > How to reproduce: > {code:java} > hbase shell < create 'test', 'cf' > (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"} > flush 'test' > split 'test' > snapshot 'test', 'testshot' > EOF > {code} > This should make sure the snapshot is made before the compaction could be > finished even with small amount of data. > {code:java} > sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > testshot -copy-to hdfs://target:8020/apps/hbase/data/ > {code} > I export the snapshot to make the usecase cleaner but deleting both the > snapshot and the original table after the cloning should have the same effect. > {code:java} > clone_snapshot 'testshot', 'test2' > delete_snapshot "testshot" > {code} > I'm not sure what would be the best way to fix this. Preventing snapshots > when a region is in split state, would make snapshot creation problematic. > Forcing to run compaction as part of the split thread would make it rather > slow. Propagating the daughter region references could prevent the deletion > of the cloned parent region and the data would not be broken anymore but I'm > not sure we have a logic in place that could pick up the pieces and finish > the split process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot
[ https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060757#comment-17060757 ] Szabolcs Bukros commented on HBASE-23995: - After the split the daughter regions only have hfile links to the storefile in the parent. Even the CatalogJanitor leaves these parents alone, only deleting it after the compaction is done and are no longer referenced from daughters. The title might not be precise or well choosen. What I wanted to say is that snapshoting a state where the split was done, but compaction was not (this is what I clumsily called "splitting") results in a structure where the daughter regions has no data just links to a parent is saved and can be exported. However not every necessary info is exported with it (daughter references from parent are missing) and this leads to an issue where in the cloned table the parent, that actually contains the data is archived then deleted in minutes after the cloning is done, resulting in loosing the exported data. > Snapshoting a splitting region results in corrupted snapshot > > > Key: HBASE-23995 > URL: https://issues.apache.org/jira/browse/HBASE-23995 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.2 >Reporter: Szabolcs Bukros >Priority: Major > > The problem seems to originate from the fact that while the region split > itself runs in a lock, the compactions following it run in separate threads. > Alternatively the use of space quota policies can prevent compaction after a > split and leads to the same issue. > In both cases the resulting snapshot will keep the split status of the parent > region, but do not keep the references to the daughter regions, because they > (splitA, splitB qualifiers) are stored separately in the meta table and do > not propagate with the snapshot. > This is important because the in the freshly cloned table CatalogJanitor will > find the parent region, realizes it is in split state, but because it can not > find the daughter region references (haven't propagated) assumes parent could > be cleaned up and deletes it. The archived region used in the snaphost only > has back reference to the now also archived parent region and if the snapshot > is deleted they both gets cleaned up. Unfortunately the daughter regions only > contains hfile links, so at this point the data is lost. > How to reproduce: > {code:java} > hbase shell < create 'test', 'cf' > (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"} > flush 'test' > split 'test' > snapshot 'test', 'testshot' > EOF > {code} > This should make sure the snapshot is made before the compaction could be > finished even with small amount of data. > {code:java} > sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > testshot -copy-to hdfs://target:8020/apps/hbase/data/ > {code} > I export the snapshot to make the usecase cleaner but deleting both the > snapshot and the original table after the cloning should have the same effect. > {code:java} > clone_snapshot 'testshot', 'test2' > delete_snapshot "testshot" > {code} > I'm not sure what would be the best way to fix this. Preventing snapshots > when a region is in split state, would make snapshot creation problematic. > Forcing to run compaction as part of the split thread would make it rather > slow. Propagating the daughter region references could prevent the deletion > of the cloned parent region and the data would not be broken anymore but I'm > not sure we have a logic in place that could pick up the pieces and finish > the split process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot
[ https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060717#comment-17060717 ] Szabolcs Bukros commented on HBASE-23995: - Hi [~zhangduo], thanks for your reply! I tested and reproduced the issue on 2.0.2 but based on a quick comparison with master I would say not much have changed and the issue should be present there too. If I understand correctly Procedure locks do not help because the compaction runs in separate threads. SplitTableRegionProcedure does the splitting, creates a ThreadPoolExecutor for the compactions and releases the locks while the compactions run in the background, making the snapshot possible. > Snapshoting a splitting region results in corrupted snapshot > > > Key: HBASE-23995 > URL: https://issues.apache.org/jira/browse/HBASE-23995 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.2 >Reporter: Szabolcs Bukros >Priority: Major > > The problem seems to originate from the fact that while the region split > itself runs in a lock, the compactions following it run in separate threads. > Alternatively the use of space quota policies can prevent compaction after a > split and leads to the same issue. > In both cases the resulting snapshot will keep the split status of the parent > region, but do not keep the references to the daughter regions, because they > (splitA, splitB qualifiers) are stored separately in the meta table and do > not propagate with the snapshot. > This is important because the in the freshly cloned table CatalogJanitor will > find the parent region, realizes it is in split state, but because it can not > find the daughter region references (haven't propagated) assumes parent could > be cleaned up and deletes it. The archived region used in the snaphost only > has back reference to the now also archived parent region and if the snapshot > is deleted they both gets cleaned up. Unfortunately the daughter regions only > contains hfile links, so at this point the data is lost. > How to reproduce: > {code:java} > hbase shell < create 'test', 'cf' > (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"} > flush 'test' > split 'test' > snapshot 'test', 'testshot' > EOF > {code} > This should make sure the snapshot is made before the compaction could be > finished even with small amount of data. > {code:java} > sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > testshot -copy-to hdfs://target:8020/apps/hbase/data/ > {code} > I export the snapshot to make the usecase cleaner but deleting both the > snapshot and the original table after the cloning should have the same effect. > {code:java} > clone_snapshot 'testshot', 'test2' > delete_snapshot "testshot" > {code} > I'm not sure what would be the best way to fix this. Preventing snapshots > when a region is in split state, would make snapshot creation problematic. > Forcing to run compaction as part of the split thread would make it rather > slow. Propagating the daughter region references could prevent the deletion > of the cloned parent region and the data would not be broken anymore but I'm > not sure we have a logic in place that could pick up the pieces and finish > the split process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot
[ https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros updated HBASE-23995: Description: The problem seems to originate from the fact that while the region split itself runs in a lock, the compactions following it run in separate threads. Alternatively the use of space quota policies can prevent compaction after a split and leads to the same issue. In both cases the resulting snapshot will keep the split status of the parent region, but do not keep the references to the daughter regions, because they (splitA, splitB qualifiers) are stored separately in the meta table and do not propagate with the snapshot. This is important because the in the freshly cloned table CatalogJanitor will find the parent region, realizes it is in split state, but because it can not find the daughter region references (haven't propagated) assumes parent could be cleaned up and deletes it. The archived region used in the snaphost only has back reference to the now also archived parent region and if the snapshot is deleted they both gets cleaned up. Unfortunately the daughter regions only contains hfile links, so at this point the data is lost. How to reproduce: {code:java} hbase shell < Snapshoting a splitting region results in corrupted snapshot > > > Key: HBASE-23995 > URL: https://issues.apache.org/jira/browse/HBASE-23995 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.2 >Reporter: Szabolcs Bukros >Priority: Major > > The problem seems to originate from the fact that while the region split > itself runs in a lock, the compactions following it run in separate threads. > Alternatively the use of space quota policies can prevent compaction after a > split and leads to the same issue. > In both cases the resulting snapshot will keep the split status of the parent > region, but do not keep the references to the daughter regions, because they > (splitA, splitB qualifiers) are stored separately in the meta table and do > not propagate with the snapshot. > This is important because the in the freshly cloned table CatalogJanitor will > find the parent region, realizes it is in split state, but because it can not > find the daughter region references (haven't propagated) assumes parent could > be cleaned up and deletes it. The archived region used in the snaphost only > has back reference to the now also archived parent region and if the snapshot > is deleted they both gets cleaned up. Unfortunately the daughter regions only > contains hfile links, so at this point the data is lost. > How to reproduce: > {code:java} > hbase shell < create 'test', 'cf' > (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"} > flush 'test' > split 'test' > snapshot 'test', 'testshot' > EOF > {code} > This should make sure the snapshot is made before the compaction could be > finished even with small amount of data. > {code:java} > sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > testshot -copy-to hdfs://target:8020/apps/hbase/data/ > {code} > I export the snapshot to make the usecase cleaner but deleting both the > snapshot and the original table after the cloning should have the same effect. > {code:java} > clone_snapshot 'testshot', 'test2' > delete_snapshot "testshot" > {code} > I'm not sure what would be the best way to fix this. Preventing snapshots > when a region is in split state, would make snapshot creation problematic. > Forcing to run compaction as part of the split thread would make it rather > slow. Propagating the daughter region references could prevent the deletion > of the cloned parent region and the data would not be broken anymore but I'm > not sure we have a logic in place that could pick up the pieces and finish > the split process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot
Szabolcs Bukros created HBASE-23995: --- Summary: Snapshoting a splitting region results in corrupted snapshot Key: HBASE-23995 URL: https://issues.apache.org/jira/browse/HBASE-23995 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 2.0.2 Reporter: Szabolcs Bukros The problem seems to originate from the fact that while the region split itself runs in a lock, the compactions following it run in separate threads. Alternatively the use of space quota policies can prevent compaction after a split and leads to the same issue. In both cases the resulting snapshot will keep the split status of the parent region, but do not keep the references to the daughter regions, because they (splitA, splitB qualifiers) are stored separately in the meta table and do not propagate with the snapshot. This is important because the in the freshly cloned table CatalogJanitor will find the parent region, realizes it is in split state, but because it can not find the daughter region references (haven't propagated) assumes parent could be cleaned up and deletes it. The archived region used in the snaphost only has back reference to the now also archived parent region and if the snapshot is deleted they both gets cleaned up. Unfortunately the daughter regions only contains hfile links, so at this point the data is lost. How to reproduce: {code:java} hbase shell <
[jira] [Assigned] (HBASE-23891) Add an option to Actions to filter out meta RS
[ https://issues.apache.org/jira/browse/HBASE-23891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros reassigned HBASE-23891: --- Assignee: Szabolcs Bukros > Add an option to Actions to filter out meta RS > -- > > Key: HBASE-23891 > URL: https://issues.apache.org/jira/browse/HBASE-23891 > Project: HBase > Issue Type: Sub-task > Components: integration tests >Affects Versions: 3.0.0 >Reporter: Tamas Adami >Assignee: Szabolcs Bukros >Priority: Minor > Fix For: 3.0.0, 2.3.0, 2.2.3 > > > Add an option to Actions to be able to filter meta server out. > Some ITs rely on meta RS and have timeout errors if this RS is killed. (e.g. > IntegrationTestTimeBoundedRequestsWithRegionReplicas) > For the time being there is no option for removing meta server from server > list to kill or configuring these actions properly. > The following chaos monkey actions are affected: > GracefulRollingRestartRsAction, RollingBatchSuspendResumeRsAction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-18326) Fix and reenable TestMasterProcedureWalLease
[ https://issues.apache.org/jira/browse/HBASE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016955#comment-17016955 ] Szabolcs Bukros commented on HBASE-18326: - Test got deleted in HBASE-23326. Can we close this ticket or should we try to re-introduce the test? > Fix and reenable TestMasterProcedureWalLease > > > Key: HBASE-18326 > URL: https://issues.apache.org/jira/browse/HBASE-18326 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Michael Stack >Priority: Blocker > Fix For: 3.0.0, 2.3.0 > > > Fix and reenable flakey important test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23601) OutputSink.WriterThread exception gets stuck and repeated indefinietly
[ https://issues.apache.org/jira/browse/HBASE-23601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014307#comment-17014307 ] Szabolcs Bukros commented on HBASE-23601: - New PR is created for branch-2. #1028 > OutputSink.WriterThread exception gets stuck and repeated indefinietly > -- > > Key: HBASE-23601 > URL: https://issues.apache.org/jira/browse/HBASE-23601 > Project: HBase > Issue Type: Bug > Components: read replicas >Affects Versions: 2.2.2 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.1.9, 2.2.4 > > > When a WriterThread runs into an exception (ie: NotServingRegionException), > the exception is stored in the controller. It is never removed and can not be > overwritten either. > > {code:java} > public void run() { > try { > doRun(); > } catch (Throwable t) { > LOG.error("Exiting thread", t); > controller.writerThreadError(t); > } > }{code} > Thanks to this every time PipelineController.checkForErrors() is called the > same old exception is rethrown. > > For example in RegionReplicaReplicationEndpoint.replicate there is a while > loop that does the actual replicating. Every time it loops, it calls > checkForErrors(), catches the rethrown exception, logs it but does nothing > about it. This results in ~2GB log files in ~5min in my experience. > > My proposal would be to clean up the stored exception when it reaches > RegionReplicaReplicationEndpoint.replicate and make sure we restart the > WriterThread that died throwing it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23591) Negative memStoreSizing
[ https://issues.apache.org/jira/browse/HBASE-23591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012853#comment-17012853 ] Szabolcs Bukros commented on HBASE-23591: - [~anoop.hbase] You are right, we would loose the file ref... Thanks for pointing it out! What do you think about a solution where we send the last x (maybe 10?) HStoreFile paths in the FlushDescriptor instead of just the latest one? In the StoreFileManager the files are ordered by seqID so grabbing the last few is easy. We would have to make sure to filter out storedFiles already listed in the store of the replica region in replayFlush too. This way the first successful flush would add all the potentially missing refs. > Negative memStoreSizing > --- > > Key: HBASE-23591 > URL: https://issues.apache.org/jira/browse/HBASE-23591 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: Szabolcs Bukros >Priority: Major > Fix For: 2.2.2 > > > After a flush on the replica region the memStoreSizing becomes negative: > {code:java} > 2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: > COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplicati > on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" > flush_sequence_number: 41392 store_flushes { family_name: "f1" > store_home_dir: "f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } > store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: > "9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" > store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } > region_name: > "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee." > 2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > 0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with > seqId:41392 and a previous prepared snapshot was found > 2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: > Region: 0beaae111b0f6e98bfde31ba35be5408 added > hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2, > entries=32445, sequenceid=41392, filesize=27.6 M > 2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: > Region: 0beaae111b0f6e98bfde31ba35be5408 added > hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e, > entries=12264, sequenceid=41392, filesize=10.9 M > 2019-12-17 08:32:00,121 INFO org.apache.hadoop.hbase.regionserver.HStore: > Region: 0beaae111b0f6e98bfde31ba35be5408 added > hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f3/e6f25e6b0eca4d22af15d0626d0f8759, > entries=32379, sequenceid=41392, filesize=27.5 M > 2019-12-17 08:32:00,122 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > CustomLog decrMemStoreSize. Current: dataSize=135810071, > getHeapSize=148400960, getOffHeapSize=0, getCellsCount=167243 delta: > dataSizeDelta=155923644, heapSizeDelta=170112320, offHeapSizeDelta=0, > cellsCountDelta=188399 > 2019-12-17 08:32:00,122 ERROR org.apache.hadoop.hbase.regionserver.HRegion: > Asked to modify this region's > (IntegrationTestRegionReplicaReplication,,1576599911697_0001.0beaae111b0f6e98bfde31ba35be54 > 08.) memStoreSizing to a negative value which is incorrect. Current > memStoreSizing=135810071, delta=-155923644 > java.lang.Exception > at > org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1323) > at > org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1316) > at > org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1303) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5194) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5025) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2232) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hb
[jira] [Commented] (HBASE-23601) OutputSink.WriterThread exception gets stuck and repeated indefinietly
[ https://issues.apache.org/jira/browse/HBASE-23601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012557#comment-17012557 ] Szabolcs Bukros commented on HBASE-23601: - [~stack] Thanks for the merge! This doesn't need a master patch. The code there was heavily rewritten and as far as I can tell it's not affected by this issue. I'll check why it fails on branch-2. > OutputSink.WriterThread exception gets stuck and repeated indefinietly > -- > > Key: HBASE-23601 > URL: https://issues.apache.org/jira/browse/HBASE-23601 > Project: HBase > Issue Type: Bug > Components: read replicas >Affects Versions: 2.2.2 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.1.9, 2.2.4 > > > When a WriterThread runs into an exception (ie: NotServingRegionException), > the exception is stored in the controller. It is never removed and can not be > overwritten either. > > {code:java} > public void run() { > try { > doRun(); > } catch (Throwable t) { > LOG.error("Exiting thread", t); > controller.writerThreadError(t); > } > }{code} > Thanks to this every time PipelineController.checkForErrors() is called the > same old exception is rethrown. > > For example in RegionReplicaReplicationEndpoint.replicate there is a while > loop that does the actual replicating. Every time it loops, it calls > checkForErrors(), catches the rethrown exception, logs it but does nothing > about it. This results in ~2GB log files in ~5min in my experience. > > My proposal would be to clean up the stored exception when it reaches > RegionReplicaReplicationEndpoint.replicate and make sure we restart the > WriterThread that died throwing it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23591) Negative memStoreSizing
[ https://issues.apache.org/jira/browse/HBASE-23591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011763#comment-17011763 ] Szabolcs Bukros commented on HBASE-23591: - [~anoop.hbase] Basically yes. HBASE-23589 was the root cause and I haven't seen this problem after the fix. But cleaner code would be nice and would prevent potential future issues. " Then when the subsequent flushes happen on the same CFs and replay WAL marker reaches the replica regions how that will get handled? " It won't get handled. We use the old snapshot for the subsequent flush. As far as I can tell this would cause no problems, it would only mean the Memstore won't be empty after the flush and would require another flush sooner. > Negative memStoreSizing > --- > > Key: HBASE-23591 > URL: https://issues.apache.org/jira/browse/HBASE-23591 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: Szabolcs Bukros >Priority: Major > Fix For: 2.2.2 > > > After a flush on the replica region the memStoreSizing becomes negative: > {code:java} > 2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: > COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplicati > on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" > flush_sequence_number: 41392 store_flushes { family_name: "f1" > store_home_dir: "f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } > store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: > "9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" > store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } > region_name: > "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee." > 2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > 0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with > seqId:41392 and a previous prepared snapshot was found > 2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: > Region: 0beaae111b0f6e98bfde31ba35be5408 added > hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2, > entries=32445, sequenceid=41392, filesize=27.6 M > 2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: > Region: 0beaae111b0f6e98bfde31ba35be5408 added > hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e, > entries=12264, sequenceid=41392, filesize=10.9 M > 2019-12-17 08:32:00,121 INFO org.apache.hadoop.hbase.regionserver.HStore: > Region: 0beaae111b0f6e98bfde31ba35be5408 added > hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f3/e6f25e6b0eca4d22af15d0626d0f8759, > entries=32379, sequenceid=41392, filesize=27.5 M > 2019-12-17 08:32:00,122 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > CustomLog decrMemStoreSize. Current: dataSize=135810071, > getHeapSize=148400960, getOffHeapSize=0, getCellsCount=167243 delta: > dataSizeDelta=155923644, heapSizeDelta=170112320, offHeapSizeDelta=0, > cellsCountDelta=188399 > 2019-12-17 08:32:00,122 ERROR org.apache.hadoop.hbase.regionserver.HRegion: > Asked to modify this region's > (IntegrationTestRegionReplicaReplication,,1576599911697_0001.0beaae111b0f6e98bfde31ba35be54 > 08.) memStoreSizing to a negative value which is incorrect. Current > memStoreSizing=135810071, delta=-155923644 > java.lang.Exception > at > org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1323) > at > org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1316) > at > org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1303) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5194) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5025) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2232) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > or
[jira] [Commented] (HBASE-23591) Negative memStoreSizing
[ https://issues.apache.org/jira/browse/HBASE-23591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010807#comment-17010807 ] Szabolcs Bukros commented on HBASE-23591: - [~anoop.hbase], [~zhangduo] ,[~stack] I did some further investigation and would like to hear your thoughts. I used the otherwise fixed issue (HBASE-23589) to break replication and make maxing the memstore out easier. But if the commit marker would be missing or some other issue would have prevented the flush from finishing, the result would be the same. The issue: We try to replay a flush for column families f1 and f3. The FlushDescriptor is incorrect and both families contain the wrong committed file list. This means we would get an exception in replayFlush when we try to call getStoreFileInfo. (This is fixed in HBASE-23589) This exception is only caught at the very end of replayWALFlushCommitMarker and there are no steps to handle it or clean up. So writestate.flushing remains true and prepareFlushResult still has a value. Next time we try to replay a flush for f2. replayWALFlushStartMarker does nothing because prepareFlushResult writestate.flushing is true. replayFlushInStores does also nothing because the prepareFlushResult exists but contains no context for f2. Unfortunately replayWALFlushCommitMarker isn't aware that no flush was made and at the end still calls decrMemStoreSize with data that has nothing to do with the current column family. The current solution doesn't handle the exception well and ignores the fact that we would need different prepared data for different cfs. My first proposal would be to drastically simplify replayWALFlushCommitMarker. Something like this: {code:java} if (prepareFlushResult != null && flush.getFlushSequenceNumber() == prepareFlushResult.flushOpSeqId) { try { replayFlushInStores(flush, prepareFlushResult, true); this.decrMemStoreSize(prepareFlushResult.totalFlushableSize.getMemStoreSize()); } catch (Exception ex){ //log exception throw ex; //maybe only re-throw if not FileNotFoundException } finally { this.prepareFlushResult = null; writestate.flushing = false; } } else{ // ... log ... this.prepareFlushResult = null; writestate.flushing = false; } {code} So unless we find the correct prepared data we clean up prepareFlushResult and skip the flush. Same if we see an exception. On the upside this would result in a more stable replica. On the downside it would also mean less successful flushes, so more memory usage and more flush attempts. I do not see any negative consequence besides the performance loss. It could be improved with checking if the prepareFlushResult uses the same cfs as the flushDescriptor and still doing the flush if seqId is newer than what we expected. But I'm not sure it's necessary. What do you think? > Negative memStoreSizing > --- > > Key: HBASE-23591 > URL: https://issues.apache.org/jira/browse/HBASE-23591 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: Szabolcs Bukros >Priority: Major > Fix For: 2.2.2 > > > After a flush on the replica region the memStoreSizing becomes negative: > {code:java} > 2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: > COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplicati > on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" > flush_sequence_number: 41392 store_flushes { family_name: "f1" > store_home_dir: "f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } > store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: > "9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" > store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } > region_name: > "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee." > 2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > 0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with > seqId:41392 and a previous prepared snapshot was found > 2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: > Region: 0beaae111b0f6e98bfde31ba35be5408 added > hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2, > entries=32445, sequenceid=41392, filesize=27.6 M > 2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: > Region: 0beaae111b0f6e98bfde31ba35be5408 added > hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e, > entries=12264, sequ
[jira] [Commented] (HBASE-23589) FlushDescriptor contains non-matching family/output combinations
[ https://issues.apache.org/jira/browse/HBASE-23589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006678#comment-17006678 ] Szabolcs Bukros commented on HBASE-23589: - [~binlijin] Thanks for the merge! > FlushDescriptor contains non-matching family/output combinations > > > Key: HBASE-23589 > URL: https://issues.apache.org/jira/browse/HBASE-23589 > Project: HBase > Issue Type: Bug > Components: read replicas >Affects Versions: 2.2.2, 2.1.8 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Critical > Fix For: 3.0.0, 2.3.0, 2.2.3, 2.1.9 > > > Flushing the active region creates the following files: > {code:java} > 2019-12-13 08:00:20,866 INFO org.apache.hadoop.hbase.regionserver.HStore: > Added > hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/dab4d1cc01e44773bad7bdb5d2e33b6c, > entries=49128, sequenceid > =70688, filesize=41.4 M > 2019-12-13 08:00:20,897 INFO org.apache.hadoop.hbase.regionserver.HStore: > Added > hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f3/ecc50f33085042f7bd2397253b896a3a, > entries=5, sequenceid > =70688, filesize=42.3 M > {code} > On the read replica region when we try to replay the flush we see the > following: > {code:java} > 2019-12-13 08:00:21,279 WARN org.apache.hadoop.hbase.regionserver.HRegion: > bfa9cdb0ab13d60b389df6621ab316d1 : At least one of the store files in flush: > action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplication" > encoded_region_name: "20af2eb8929408f26d0b3b81e6b86d47" > flush_sequence_number: 70688 store_flushes { family_name: "f2" > store_home_dir: "f2" flush_output: "ecc50f33085042f7bd2397253b896a3a" } > store_flushes { family_name: "f3" store_home_dir: "f3" flush_output: > "dab4d1cc01e44773bad7bdb5d2e33b6c" } region_name: > "IntegrationTestRegionReplicaReplication,,1576252065847.20af2eb8929408f26d0b3b81e6b86d47." > doesn't exist any more. Skip loading the file(s) > java.io.FileNotFoundException: HFileLink > locations=[hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a, > > hdfs://replica-1:8020/hbase/.tmp/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a, > > hdfs://replica-1:8020/hbase/mobdir/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a, > > hdfs://replica-1:8020/hbase/archive/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a] > at > org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:415) > at > org.apache.hadoop.hbase.util.ServerRegionReplicaUtil.getStoreFileInfo(ServerRegionReplicaUtil.java:135) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFileInfo(HRegionFileSystem.java:311) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.replayFlush(HStore.java:2414) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayFlushInStores(HRegion.java:5310) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5184) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5018) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2229) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) > {code} > As we can see the flush_outputs got mixed up. > > The issue is caused by HRegion.internalFlushCacheAndCommit. The code assumes > "{color:#808080}stores.values() and storeFlushCtxs have same order{color}" > which no longer seems to be true. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23601) OutputSink.WriterThread exception gets stuck and repeated indefinietly
[ https://issues.apache.org/jira/browse/HBASE-23601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros updated HBASE-23601: Affects Version/s: 2.2.2 > OutputSink.WriterThread exception gets stuck and repeated indefinietly > -- > > Key: HBASE-23601 > URL: https://issues.apache.org/jira/browse/HBASE-23601 > Project: HBase > Issue Type: Bug > Components: read replicas >Affects Versions: 2.2.2 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.2.3 > > > When a WriterThread runs into an exception (ie: NotServingRegionException), > the exception is stored in the controller. It is never removed and can not be > overwritten either. > > {code:java} > public void run() { > try { > doRun(); > } catch (Throwable t) { > LOG.error("Exiting thread", t); > controller.writerThreadError(t); > } > }{code} > Thanks to this every time PipelineController.checkForErrors() is called the > same old exception is rethrown. > > For example in RegionReplicaReplicationEndpoint.replicate there is a while > loop that does the actual replicating. Every time it loops, it calls > checkForErrors(), catches the rethrown exception, logs it but does nothing > about it. This results in ~2GB log files in ~5min in my experience. > > My proposal would be to clean up the stored exception when it reaches > RegionReplicaReplicationEndpoint.replicate and make sure we restart the > WriterThread that died throwing it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23601) OutputSink.WriterThread exception gets stuck and repeated indefinietly
Szabolcs Bukros created HBASE-23601: --- Summary: OutputSink.WriterThread exception gets stuck and repeated indefinietly Key: HBASE-23601 URL: https://issues.apache.org/jira/browse/HBASE-23601 Project: HBase Issue Type: Bug Components: read replicas Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros Fix For: 2.2.2 When a WriterThread runs into an exception (ie: NotServingRegionException), the exception is stored in the controller. It is never removed and can not be overwritten either. {code:java} public void run() { try { doRun(); } catch (Throwable t) { LOG.error("Exiting thread", t); controller.writerThreadError(t); } }{code} Thanks to this every time PipelineController.checkForErrors() is called the same old exception is rethrown. For example in RegionReplicaReplicationEndpoint.replicate there is a while loop that does the actual replicating. Every time it loops, it calls checkForErrors(), catches the rethrown exception, logs it but does nothing about it. This results in ~2GB log files in ~5min in my experience. My proposal would be to clean up the stored exception when it reaches RegionReplicaReplicationEndpoint.replicate and make sure we restart the WriterThread that died throwing it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23591) Negative memStoreSizing
[ https://issues.apache.org/jira/browse/HBASE-23591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros updated HBASE-23591: Description: After a flush on the replica region the memStoreSizing becomes negative: {code:java} 2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplicati on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" flush_sequence_number: 41392 store_flushes { family_name: "f1" store_home_dir: "f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: "9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } region_name: "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee." 2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with seqId:41392 and a previous prepared snapshot was found 2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: Region: 0beaae111b0f6e98bfde31ba35be5408 added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2, entries=32445, sequenceid=41392, filesize=27.6 M 2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: Region: 0beaae111b0f6e98bfde31ba35be5408 added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e, entries=12264, sequenceid=41392, filesize=10.9 M 2019-12-17 08:32:00,121 INFO org.apache.hadoop.hbase.regionserver.HStore: Region: 0beaae111b0f6e98bfde31ba35be5408 added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f3/e6f25e6b0eca4d22af15d0626d0f8759, entries=32379, sequenceid=41392, filesize=27.5 M 2019-12-17 08:32:00,122 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: CustomLog decrMemStoreSize. Current: dataSize=135810071, getHeapSize=148400960, getOffHeapSize=0, getCellsCount=167243 delta: dataSizeDelta=155923644, heapSizeDelta=170112320, offHeapSizeDelta=0, cellsCountDelta=188399 2019-12-17 08:32:00,122 ERROR org.apache.hadoop.hbase.regionserver.HRegion: Asked to modify this region's (IntegrationTestRegionReplicaReplication,,1576599911697_0001.0beaae111b0f6e98bfde31ba35be54 08.) memStoreSizing to a negative value which is incorrect. Current memStoreSizing=135810071, delta=-155923644 java.lang.Exception at org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1323) at org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1316) at org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1303) at org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5194) at org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5025) at org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143) at org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2232) at org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) {code} I added some custom logging to the snapshot logic to be able to see snapshot sizes: {code:java} 2019-12-17 08:31:56,900 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: START_FLUSH table_name: "IntegrationTestRegionReplicaReplication" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" flush_sequence_number: 41392 store_flushes { family_name: "f1" store_home_dir: "f1" } store_flushes { family_name: "f2" store_home_dir: "f2" } store_flushes { family_name: "f3" store_home_dir: "f3" } region_name: "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee." 2019-12-17 08:31:56,900 INFO org.apache.hadoop.hbase.regionserver.HRegion: Flushing 0beaae111b0f6e98bfde31ba35be5408 3/3 column families, dataSize=126.49 MB heapSize=138.24 MB 2019-12-17 08:31:56,900 WARN org.apache.hadoop.hbase.regionserver.DefaultMemStore: Snapshot called again without clearing previous.
[jira] [Created] (HBASE-23591) Negative memStoreSizing
Szabolcs Bukros created HBASE-23591: --- Summary: Negative memStoreSizing Key: HBASE-23591 URL: https://issues.apache.org/jira/browse/HBASE-23591 Project: HBase Issue Type: Bug Components: read replicas Reporter: Szabolcs Bukros Fix For: 2.2.2 After a flush on the replica region the memStoreSizing becomes negative: {code:java} 2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplicati on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" flush_sequence_number: 41392 store_flushes { family_name: "f1" store_home_dir: "f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: "9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } region_name: "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee." 2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with seqId:41392 and a previous prepared snapshot was found 2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: Region: 0beaae111b0f6e98bfde31ba35be5408 added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2, entries=32445, sequenceid=41392, filesize=27.6 M 2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: Region: 0beaae111b0f6e98bfde31ba35be5408 added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e, entries=12264, sequenceid=41392, filesize=10.9 M 2019-12-17 08:32:00,121 INFO org.apache.hadoop.hbase.regionserver.HStore: Region: 0beaae111b0f6e98bfde31ba35be5408 added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f3/e6f25e6b0eca4d22af15d0626d0f8759, entries=32379, sequenceid=41392, filesize=27.5 M 2019-12-17 08:32:00,122 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: CustomLog decrMemStoreSize. Current: dataSize=135810071, getHeapSize=148400960, getOffHeapSize=0, getCellsCount=167243 delta: dataSizeDelta=155923644, heapSizeDelta=170112320, offHeapSizeDelta=0, cellsCountDelta=188399 2019-12-17 08:32:00,122 ERROR org.apache.hadoop.hbase.regionserver.HRegion: Asked to modify this region's (IntegrationTestRegionReplicaReplication,,1576599911697_0001.0beaae111b0f6e98bfde31ba35be54 08.) memStoreSizing to a negative value which is incorrect. Current memStoreSizing=135810071, delta=-155923644 java.lang.Exception at org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1323) at org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1316) at org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1303) at org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5194) at org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5025) at org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143) at org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2232) at org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) {code} I added some custom logging to the snapshot logic to be able to see snapshot sizes: {code:java} 2019-12-17 08:31:56,900 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: START_FLUSH table_name: "IntegrationTestRegionReplicaReplication" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" flush_sequence_number: 41392 store_flushes { family_name: "f1" store_home_dir: "f1" } store_flushes { family_name: "f2" store_home_dir: "f2" } store_flushes { family_name: "f3" store_home_dir: "f3" } region_name: "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee." 2019-12-17 08:31:56,900 INFO org.apache.hadoop.hbase.regionserver.HRegion: Flushing 0beaae111b0f6e98bfde31ba35be5408 3/3 column famil
[jira] [Updated] (HBASE-23589) FlushDescriptor contains non-matching family/output combinations
[ https://issues.apache.org/jira/browse/HBASE-23589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros updated HBASE-23589: Description: Flushing the active region creates the following files: {code:java} 2019-12-13 08:00:20,866 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/dab4d1cc01e44773bad7bdb5d2e33b6c, entries=49128, sequenceid =70688, filesize=41.4 M 2019-12-13 08:00:20,897 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f3/ecc50f33085042f7bd2397253b896a3a, entries=5, sequenceid =70688, filesize=42.3 M {code} On the read replica region when we try to replay the flush we see the following: {code:java} 2019-12-13 08:00:21,279 WARN org.apache.hadoop.hbase.regionserver.HRegion: bfa9cdb0ab13d60b389df6621ab316d1 : At least one of the store files in flush: action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplication" encoded_region_name: "20af2eb8929408f26d0b3b81e6b86d47" flush_sequence_number: 70688 store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: "ecc50f33085042f7bd2397253b896a3a" } store_flushes { family_name: "f3" store_home_dir: "f3" flush_output: "dab4d1cc01e44773bad7bdb5d2e33b6c" } region_name: "IntegrationTestRegionReplicaReplication,,1576252065847.20af2eb8929408f26d0b3b81e6b86d47." doesn't exist any more. Skip loading the file(s) java.io.FileNotFoundException: HFileLink locations=[hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a, hdfs://replica-1:8020/hbase/.tmp/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a, hdfs://replica-1:8020/hbase/mobdir/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a, hdfs://replica-1:8020/hbase/archive/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a] at org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:415) at org.apache.hadoop.hbase.util.ServerRegionReplicaUtil.getStoreFileInfo(ServerRegionReplicaUtil.java:135) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFileInfo(HRegionFileSystem.java:311) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.replayFlush(HStore.java:2414) at org.apache.hadoop.hbase.regionserver.HRegion.replayFlushInStores(HRegion.java:5310) at org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5184) at org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5018) at org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143) at org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2229) at org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) {code} As we can see the flush_outputs got mixed up. The issue is caused by HRegion.internalFlushCacheAndCommit. The code assumes "{color:#808080}stores.values() and storeFlushCtxs have same order{color}" which no longer seems to be true. was: Flushing the active region creates the following files: {code:java} 2019-12-13 08:00:20,866 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/dab4d1cc01e44773bad7bdb5d2e33b6c, entries=49128, sequenceid =70688, filesize=41.4 M 2019-12-13 08:00:20,897 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f3/ecc50f33085042f7bd2397253b896a3a, entries=5, sequenceid =70688, filesize=42.3 M {code} On the read replica region when we try to replay the flush we see the following: {code:java} 2019-12-13 08:00:21,279 WARN org.apache.hadoop.hbase.regionserver.HRegion: bfa9cdb0ab13d60b389df6621ab316d1 : At least one of the store files in flush: action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplication" encoded_region_name: "20af2eb8929408f26d0b3b81e6b86d47" flush_sequence_number: 70688 sto
[jira] [Created] (HBASE-23589) FlushDescriptor contains non-matching family/output combinations
Szabolcs Bukros created HBASE-23589: --- Summary: FlushDescriptor contains non-matching family/output combinations Key: HBASE-23589 URL: https://issues.apache.org/jira/browse/HBASE-23589 Project: HBase Issue Type: Bug Components: read replicas Affects Versions: 2.2.2 Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros Flushing the active region creates the following files: {code:java} 2019-12-13 08:00:20,866 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/dab4d1cc01e44773bad7bdb5d2e33b6c, entries=49128, sequenceid =70688, filesize=41.4 M 2019-12-13 08:00:20,897 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f3/ecc50f33085042f7bd2397253b896a3a, entries=5, sequenceid =70688, filesize=42.3 M {code} On the read replica region when we try to replay the flush we see the following: {code:java} 2019-12-13 08:00:21,279 WARN org.apache.hadoop.hbase.regionserver.HRegion: bfa9cdb0ab13d60b389df6621ab316d1 : At least one of the store files in flush: action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplication" encoded_region_name: "20af2eb8929408f26d0b3b81e6b86d47" flush_sequence_number: 70688 store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: "ecc50f33085042f7bd2397253b896a3a" } store_flushes { family_name: "f3" store_home_dir: "f3" flush_output: "dab4d1cc01e44773bad7bdb5d2e33b6c" } region_name: "IntegrationTestRegionReplicaReplication,,1576252065847.20af2eb8929408f26d0b3b81e6b86d47." doesn't exist any more. Skip loading the file(s) java.io.FileNotFoundException: HFileLink locations=[hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a, hdfs://replica-1:8020/hbase/.tmp/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a, hdfs://replica-1:8020/hbase/mobdir/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a, hdfs://replica-1:8020/hbase/archive/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a] at org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:415) at org.apache.hadoop.hbase.util.ServerRegionReplicaUtil.getStoreFileInfo(ServerRegionReplicaUtil.java:135) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFileInfo(HRegionFileSystem.java:311) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.replayFlush(HStore.java:2414) at org.apache.hadoop.hbase.regionserver.HRegion.replayFlushInStores(HRegion.java:5310) at org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5184) at org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5018) at org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143) at org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2229) at org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) {code} As you can see the flush_outputs are mixed up. The issue is caused by HRegion.internalFlushCacheAndCommit. The code assumes "{color:#808080}stores.values() and storeFlushCtxs have same order{color}" which no longer seems to be true. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23566) Fix package/packet terminology problem in chaos monkeys
Szabolcs Bukros created HBASE-23566: --- Summary: Fix package/packet terminology problem in chaos monkeys Key: HBASE-23566 URL: https://issues.apache.org/jira/browse/HBASE-23566 Project: HBase Issue Type: Improvement Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros There is a terminology problem in some of the network issue related chaos monkey actions. The universally understood technical term for network packet is packet, not "package". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23085) Network and Data related Actions
[ https://issues.apache.org/jira/browse/HBASE-23085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992721#comment-16992721 ] Szabolcs Bukros commented on HBASE-23085: - [~apurtell] you are absolutely right, thanks for noticing. I should be able to create a PR later this week if that's fine for you. > Network and Data related Actions > > > Key: HBASE-23085 > URL: https://issues.apache.org/jira/browse/HBASE-23085 > Project: HBase > Issue Type: Sub-task > Components: integration tests >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Minor > Fix For: 3.0.0, 2.3.0, 2.2.3 > > > Add additional actions to: > * manipulate network packages with tc (reorder, loose,...) > * add CPU load > * fill the disk > * corrupt or delete regionserver data files > Create new monkey factories for the new actions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23352) Allow chaos monkeys to access cmd line params, and improve FillDiskCommandAction
Szabolcs Bukros created HBASE-23352: --- Summary: Allow chaos monkeys to access cmd line params, and improve FillDiskCommandAction Key: HBASE-23352 URL: https://issues.apache.org/jira/browse/HBASE-23352 Project: HBase Issue Type: Improvement Components: integration tests Affects Versions: 2.2.2 Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros When integration tests are run through hbase cli the properties passed as cmd line params does not reach the chaos monkies. It is possible to define a property file, but it would be more flexible if we could also pick up properties from the command line. Also I would like to improve FillDiskCommandAction, to stop the remote process if the call times out before it could have finished or was run without a size parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-23085) Network and Data related Actions
[ https://issues.apache.org/jira/browse/HBASE-23085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros reopened HBASE-23085: - Backport commit to branch-2 and branch-2.2 > Network and Data related Actions > > > Key: HBASE-23085 > URL: https://issues.apache.org/jira/browse/HBASE-23085 > Project: HBase > Issue Type: Sub-task > Components: integration tests >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Minor > Fix For: 3.0.0 > > > Add additional actions to: > * manipulate network packages with tc (reorder, loose,...) > * add CPU load > * fill the disk > * corrupt or delete regionserver data files > Create new monkey factories for the new actions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23085) Network and Data related Actions
Szabolcs Bukros created HBASE-23085: --- Summary: Network and Data related Actions Key: HBASE-23085 URL: https://issues.apache.org/jira/browse/HBASE-23085 Project: HBase Issue Type: Sub-task Components: integration tests Reporter: Szabolcs Bukros Assignee: Szabolcs Bukros Add additional actions to: * manipulate network packages with tc (reorder, loose,...) * add CPU load * fill the disk * corrupt or delete regionserver data files Create new monkey factories for the new actions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-22982) Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart
[ https://issues.apache.org/jira/browse/HBASE-22982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros reassigned HBASE-22982: --- Assignee: Szabolcs Bukros > Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart > - > > Key: HBASE-22982 > URL: https://issues.apache.org/jira/browse/HBASE-22982 > Project: HBase > Issue Type: Sub-task > Components: integration tests >Affects Versions: 3.0.0 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Minor > > * Add a Chaos Monkey action that uses SIGSTOP and SIGCONT to hang and resume > a ratio of region servers. > * Add a Chaos Monkey action to simulate a rolling restart including > graceful_stop like functionality that unloads the regions from the server > before a restart and then places it under load again afterwards. > * Add these actions to the relevant monkeys -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22982) Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart
[ https://issues.apache.org/jira/browse/HBASE-22982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szabolcs Bukros updated HBASE-22982: Description: * Add a Chaos Monkey action that uses SIGSTOP and SIGCONT to hang and resume a ratio of region servers. * Add a Chaos Monkey action to simulate a rolling restart including graceful_stop like functionality that unloads the regions from the server before a restart and then places it under load again afterwards. * Add these actions to the relevant monkeys was: * Add a Chaos Monkey action that uses SIGSTOP and SIGCONT to hang and resume a ratio of region servers. * Add a Chaos Monkey action to simulate a rolling restart including graceful_stop like functionality that unloads the regions from the server before a restart and then places it under load again afterwards. > Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart > - > > Key: HBASE-22982 > URL: https://issues.apache.org/jira/browse/HBASE-22982 > Project: HBase > Issue Type: Sub-task > Components: integration tests >Affects Versions: 3.0.0 >Reporter: Szabolcs Bukros >Priority: Minor > > * Add a Chaos Monkey action that uses SIGSTOP and SIGCONT to hang and resume > a ratio of region servers. > * Add a Chaos Monkey action to simulate a rolling restart including > graceful_stop like functionality that unloads the regions from the server > before a restart and then places it under load again afterwards. > * Add these actions to the relevant monkeys -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HBASE-22982) Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart
Szabolcs Bukros created HBASE-22982: --- Summary: Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart Key: HBASE-22982 URL: https://issues.apache.org/jira/browse/HBASE-22982 Project: HBase Issue Type: Sub-task Components: integration tests Affects Versions: 3.0.0 Reporter: Szabolcs Bukros * Add a Chaos Monkey action that uses SIGSTOP and SIGCONT to hang and resume a ratio of region servers. * Add a Chaos Monkey action to simulate a rolling restart including graceful_stop like functionality that unloads the regions from the server before a restart and then places it under load again afterwards. -- This message was sent by Atlassian Jira (v8.3.2#803003)