[jira] [Resolved] (HBASE-28464) Make replication ZKWatcher config customizable in extensions

2024-04-24 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros resolved HBASE-28464.
-
Resolution: Implemented

Implemented by HBASE-28529

> Make replication ZKWatcher config customizable in extensions
> 
>
> Key: HBASE-28464
> URL: https://issues.apache.org/jira/browse/HBASE-28464
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
>  Labels: pull-request-available
>
> The ZKWatcher in HBaseReplicationEndpoint always uses the source cluster's 
> ZooKeeper client config when connecting to the target cluster's zk. Those 
> might not match. I would like to make the used ZKClientConfig customizable 
> for replication extensions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28464) Make replication ZKWatcher config customizable in extensions

2024-03-28 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-28464:
---

 Summary: Make replication ZKWatcher config customizable in 
extensions
 Key: HBASE-28464
 URL: https://issues.apache.org/jira/browse/HBASE-28464
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


The ZKWatcher in HBaseReplicationEndpoint always uses the source cluster's 
ZooKeeper client config when connecting to the target cluster's zk. Those might 
not match. I would like to make the used ZKClientConfig customizable for 
replication extensions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27493) Allow namespace admins to clone snapshots created by them

2023-01-18 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678222#comment-17678222
 ] 

Szabolcs Bukros commented on HBASE-27493:
-

[~psomogyi] Added the release notes. Thanks a lot for the merge!

> Allow namespace admins to clone snapshots created by them
> -
>
> Key: HBASE-27493
> URL: https://issues.apache.org/jira/browse/HBASE-27493
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-3, 2.5.1
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> Creating a snapshot requires table admin permissions. But cloning it requires 
> global admin permissions unless the user owns the snapshot and wants to 
> recreate the original table the snapshot was based on using the same table 
> name. This puts unnecessary load on the few people having global admin 
> permissions. I would like to relax this rule a bit and allow the owner of the 
> snapshot to clone it into any namespace where they have admin permissions 
> regardless of the table name used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27493) Allow namespace admins to clone snapshots created by them

2023-01-18 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros updated HBASE-27493:

Release Note: Allow namespace admins to clone snapshots created by them to 
any table inside their namespace, not just re-create the old table

> Allow namespace admins to clone snapshots created by them
> -
>
> Key: HBASE-27493
> URL: https://issues.apache.org/jira/browse/HBASE-27493
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-3, 2.5.1
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> Creating a snapshot requires table admin permissions. But cloning it requires 
> global admin permissions unless the user owns the snapshot and wants to 
> recreate the original table the snapshot was based on using the same table 
> name. This puts unnecessary load on the few people having global admin 
> permissions. I would like to relax this rule a bit and allow the owner of the 
> snapshot to clone it into any namespace where they have admin permissions 
> regardless of the table name used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27493) Allow namespace admins to clone snapshots created by them

2022-11-18 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-27493:
---

 Summary: Allow namespace admins to clone snapshots created by them
 Key: HBASE-27493
 URL: https://issues.apache.org/jira/browse/HBASE-27493
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 2.5.1, 3.0.0-alpha-3
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


Creating a snapshot requires table admin permissions. But cloning it requires 
global admin permissions unless the user owns the snapshot and wants to 
recreate the original table the snapshot was based on using the same table 
name. This puts unnecessary load on the few people having global admin 
permissions. I would like to relax this rule a bit and allow the owner of the 
snapshot to clone it into any namespace where they have admin permissions 
regardless of the table name used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27154) Backport missing MOB related changes to branch-2

2022-08-17 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580795#comment-17580795
 ] 

Szabolcs Bukros commented on HBASE-27154:
-

[~ndimiduk] The PR is already up.

> Backport missing MOB related changes to branch-2
> 
>
> Key: HBASE-27154
> URL: https://issues.apache.org/jira/browse/HBASE-27154
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.6.0
>Reporter: Szabolcs Bukros
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.5.0
>
>
> While trying to backport https://issues.apache.org/jira/browse/HBASE-26969 to 
> branch-2 I have found that multiple major MOB related changes are missing. 
> This change is required for FileBased SFT correctness so the changes it 
> depends on should be backported first. Also any improvement to MOB stability 
> is usually welcomed.
> The missing changes I have found so far:
> https://issues.apache.org/jira/browse/HBASE-22749
> https://issues.apache.org/jira/browse/HBASE-23723
> https://issues.apache.org/jira/browse/HBASE-24163
> There is also a docs change describing the new MOB functionality. But 
> considering that the book is always generated based on master I think it is 
> safe to skip backporting it.
> https://issues.apache.org/jira/browse/HBASE-23198
> I'm planning to backport these changes one by one until we reach a state 
> where HBASE-26969  can be backported too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-08-17 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580779#comment-17580779
 ] 

Szabolcs Bukros edited comment on HBASE-26969 at 8/17/22 1:10 PM:
--

[~apurtell] [~huaxiangsun] Please find 
[#4712|https://github.com/apache/hbase/pull/4712] with the branch-2 backport.
Please note I made some additional changes in this PR:
 *   [#4617|https://github.com/apache/hbase/pull/4617] left 2 test classes in 
the code that the original commit deleted and I had to delete them here.
 * HBASE-25970 is another dependency I missed in HBASE-27154. To shorten the 
review cycles instead of backporting it separately I included the changes it 
contained in this PR. Please let me know if that would not be acceptable and I 
prepare a separate PR for that backport.


was (Author: bszabolcs):
[~apurtell] [~huaxiangsun] Please find 
[#4712|https://github.com/apache/hbase/pull/4712] with the branch-2 backport.
Please note I made some additional changes:
 *   [#4617|https://github.com/apache/hbase/pull/4617] left 2 test classes in 
the code that the original commit deleted and I had to delete them here.
 * HBASE-25970 is another dependency I missed in HBASE-27154. To shorten the 
review cycles instead of backporting it separately I included the changes it 
contained in this PR. Please let me know if that would not be acceptable and I 
prepare a separate PR for that backport.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-4
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-08-17 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580779#comment-17580779
 ] 

Szabolcs Bukros edited comment on HBASE-26969 at 8/17/22 1:09 PM:
--

[~apurtell] [~huaxiangsun] Please find 
[#4712|https://github.com/apache/hbase/pull/4712] with the branch-2 backport.
Please note I made some additional changes:
 *   [#4617|https://github.com/apache/hbase/pull/4617] left 2 test classes in 
the code that the original commit deleted and I had to delete them here.
 * HBASE-25970 is another dependency I missed in HBASE-27154. To shorten the 
review cycles instead of backporting it separately I included the changes it 
contained in this PR. Please let me know if that would not be acceptable and I 
prepare a separate PR for that backport.


was (Author: bszabolcs):
[~apurtell] [~huaxiangsun] Please find #4712 with the branch-2 backport.
Please note I made some additional changes:
 *   [#4617|https://github.com/apache/hbase/pull/4617] left 2 test classes in 
the code that the original commit deleted and I had to delete them here.
 * HBASE-25970 is another dependency I missed in HBASE-27154. To shorten the 
review cycles instead of backporting it separately I included the changes it 
contained in this PR. Please let me know if that would not be acceptable and I 
prepare a separate PR for that backport.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-4
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-08-17 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580779#comment-17580779
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

[~apurtell] [~huaxiangsun] Please find #4712 with the branch-2 backport.
Please note I made some additional changes:
 *   [#4617|https://github.com/apache/hbase/pull/4617] left 2 test classes in 
the code that the original commit deleted and I had to delete them here.
 * HBASE-25970 is another dependency I missed in HBASE-27154. To shorten the 
review cycles instead of backporting it separately I included the changes it 
contained in this PR. Please let me know if that would not be acceptable and I 
prepare a separate PR for that backport.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-4
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27204) BlockingRpcClient will hang for 20 seconds when SASL is enabled after finishing negotiation

2022-07-19 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568597#comment-17568597
 ] 

Szabolcs Bukros commented on HBASE-27204:
-

[~apurtell] Please revert HBASE-24579.

I have done some of the investigation I should have done 2 years ago and found 
that not reading the potential error msg might not be limited to PLAIN sasl. 
Based on my understanding of the code this could happen with GSS too. 
GssKrb5Client after evaluating the final handshake challenge can send a 
gssOutToken back to the server, just after setting "completed" to true. Then 
GssKrb5Server tries to evaluate the response in doHandshake2 where it either 
fails with an exception or returns with null, basically producing the same 
issue we have with PLAIN sasl. Because the client is already completed the 
potential response is never read.

I think a potential fix would have 3 parts.
 * ServerRpcConnection.saslReadAndProcess could be changed to always return a 
response even if replyToken is null. Maybe just an empty byte array. This would 
make the communication consistent by allowing us to always check the stream for 
a response.
 * HBaseSaslRpcClient.saslConnect now could be extended to track if a 
"readStatus" was called after a response was writen. If the client is complete, 
but we are still waiting for a response we could call "readStatus".
 * Netty. Considering ServerRpcConnection.saslReadAndProcess is shared between 
the implementation I assume the issue is present in Netty too, but I do not 
understand that code well enough to propose a solution.

What do you think?

> BlockingRpcClient will hang for 20 seconds when SASL is enabled after 
> finishing negotiation
> ---
>
> Key: HBASE-27204
> URL: https://issues.apache.org/jira/browse/HBASE-27204
> Project: HBase
>  Issue Type: Bug
>  Components: rpc, sasl, security
>Reporter: Duo Zhang
>Assignee: Andrew Kyle Purtell
>Priority: Critical
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> Found this when implementing HBASE-27185. When running TestSecureIPC, if 
> BlockingRpcClient is used, the tests will spend much more time comparing to 
> NettyRpcClient.
> The problem is that, for the normal kerberos authentication, the last step is 
> client send a reply to server, so after server receives the last token, it 
> will not write anything back but expect client to send connection header.
> In HBASE-24579, for reading the error message, we added a readReply after the 
> SaslClient indicates that the negotiation is completed. But as said above, 
> for normal cases, we will not write anything back from server side, so the 
> client will hang there and only throw an exception when timeout is reached, 
> which is 20 seconds.
> This nearly makes the BlockingRpcClient unusable when sasl is enabled, as it 
> will hang 20 seconds when connecting...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27204) BlockingRpcClient will hang for 20 seconds when SASL is enabled after finishing negotiation

2022-07-15 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567319#comment-17567319
 ] 

Szabolcs Bukros commented on HBASE-27204:
-

[~zhangduo] I agree, my solution is bad. Not trying to defend it, just wanted 
to add some context.

> BlockingRpcClient will hang for 20 seconds when SASL is enabled after 
> finishing negotiation
> ---
>
> Key: HBASE-27204
> URL: https://issues.apache.org/jira/browse/HBASE-27204
> Project: HBase
>  Issue Type: Bug
>  Components: rpc, sasl, security
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> Found this when implementing HBASE-27185. When running TestSecureIPC, if 
> BlockingRpcClient is used, the tests will spend much more time comparing to 
> NettyRpcClient.
> The problem is that, for the normal kerberos authentication, the last step is 
> client send a reply to server, so after server receives the last token, it 
> will not write anything back but expect client to send connection header.
> In HBASE-24579, for reading the error message, we added a readReply after the 
> SaslClient indicates that the negotiation is completed. But as said above, 
> for normal cases, we will not write anything back from server side, so the 
> client will hang there and only throw an exception when timeout is reached, 
> which is 20 seconds.
> This nearly makes the BlockingRpcClient unusable when sasl is enabled, as it 
> will hang 20 seconds when connecting...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27204) BlockingRpcClient will hang for 20 seconds when SASL is enabled after finishing negotiation

2022-07-15 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567308#comment-17567308
 ] 

Szabolcs Bukros commented on HBASE-27204:
-

[~zhangduo] We were experimenting with a custom rpc client based on the 
blocking rpc client that would also support PLAIN auth, when encountered the 
issue. Basically I have seen that that the PLAIN client sets "completed = true" 
at getInitialResponse() call and because of this, skips the rest of the logic 
in the method. This means if the authentication or the connection fails the 
potential error msg is never read and the application just assumes everything 
is all right. The SaslClientAuthenticationProvider is plugable with 
BlockingRpcConnection too meaning this could happen there too and I wanted to 
provide a fix that would prevent this. Unfortunately I have not fully grasped 
the issue and the consequences of my "fix".

> BlockingRpcClient will hang for 20 seconds when SASL is enabled after 
> finishing negotiation
> ---
>
> Key: HBASE-27204
> URL: https://issues.apache.org/jira/browse/HBASE-27204
> Project: HBase
>  Issue Type: Bug
>  Components: rpc, sasl, security
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> Found this when implementing HBASE-27185. When running TestSecureIPC, if 
> BlockingRpcClient is used, the tests will spend much more time comparing to 
> NettyRpcClient.
> The problem is that, for the normal kerberos authentication, the last step is 
> client send a reply to server, so after server receives the last token, it 
> will not write anything back but expect client to send connection header.
> In HBASE-24579, for reading the error message, we added a readReply after the 
> SaslClient indicates that the negotiation is completed. But as said above, 
> for normal cases, we will not write anything back from server side, so the 
> client will hang there and only throw an exception when timeout is reached, 
> which is 20 seconds.
> This nearly makes the BlockingRpcClient unusable when sasl is enabled, as it 
> will hang 20 seconds when connecting...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27154) Backport missing MOB related changes to branch-2

2022-07-13 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566256#comment-17566256
 ] 

Szabolcs Bukros commented on HBASE-27154:
-

Thanks a lot for your help [~apurtell] !

> Backport missing MOB related changes to branch-2
> 
>
> Key: HBASE-27154
> URL: https://issues.apache.org/jira/browse/HBASE-27154
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.6.0
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
>
> While trying to backport https://issues.apache.org/jira/browse/HBASE-26969 to 
> branch-2 I have found that multiple major MOB related changes are missing. 
> This change is required for FileBased SFT correctness so the changes it 
> depends on should be backported first. Also any improvement to MOB stability 
> is usually welcomed.
> The missing changes I have found so far:
> https://issues.apache.org/jira/browse/HBASE-22749
> https://issues.apache.org/jira/browse/HBASE-23723
> https://issues.apache.org/jira/browse/HBASE-24163
> There is also a docs change describing the new MOB functionality. But 
> considering that the book is always generated based on master I think it is 
> safe to skip backporting it.
> https://issues.apache.org/jira/browse/HBASE-23198
> I'm planning to backport these changes one by one until we reach a state 
> where HBASE-26969  can be backported too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-07-12 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565318#comment-17565318
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

[~apurtell] branch-2 misses some very important MOB related commits that this 
change relies on and we have to backport those first. It's tracked in 
HBASE-27154. The first and most complex backport is done, the rest should be 
easier but I have deadlines coming up and will not be able to continue this for 
a few weeks. If you could do the rest or find someone to do the rest of the 
missing commits, I would be happy to prepare a backport for this commit though.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27154) Backport missing MOB related changes to branch-2

2022-06-27 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559143#comment-17559143
 ] 

Szabolcs Bukros commented on HBASE-27154:
-

PR for HBASE-22749 :
https://github.com/apache/hbase/pull/4581

> Backport missing MOB related changes to branch-2
> 
>
> Key: HBASE-27154
> URL: https://issues.apache.org/jira/browse/HBASE-27154
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.6.0
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
>
> While trying to backport https://issues.apache.org/jira/browse/HBASE-26969 to 
> branch-2 I have found that multiple major MOB related changes are missing. 
> This change is required for FileBased SFT correctness so the changes it 
> depends on should be backported first. Also any improvement to MOB stability 
> is usually welcomed.
> The missing changes I have found so far:
> https://issues.apache.org/jira/browse/HBASE-22749
> https://issues.apache.org/jira/browse/HBASE-23723
> https://issues.apache.org/jira/browse/HBASE-24163
> There is also a docs change describing the new MOB functionality. But 
> considering that the book is always generated based on master I think it is 
> safe to skip backporting it.
> https://issues.apache.org/jira/browse/HBASE-23198
> I'm planning to backport these changes one by one until we reach a state 
> where HBASE-26969  can be backported too.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27154) Backport missing MOB related changes to branch-2

2022-06-23 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-27154:
---

 Summary: Backport missing MOB related changes to branch-2
 Key: HBASE-27154
 URL: https://issues.apache.org/jira/browse/HBASE-27154
 Project: HBase
  Issue Type: Bug
  Components: mob
Affects Versions: 2.6.0
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


While trying to backport https://issues.apache.org/jira/browse/HBASE-26969 to 
branch-2 I have found that multiple major MOB related changes are missing. 

This change is required for FileBased SFT correctness so the changes it depends 
on should be backported first. Also any improvement to MOB stability is usually 
welcomed.

The missing changes I have found so far:
https://issues.apache.org/jira/browse/HBASE-22749
https://issues.apache.org/jira/browse/HBASE-23723
https://issues.apache.org/jira/browse/HBASE-24163

There is also a docs change describing the new MOB functionality. But 
considering that the book is always generated based on master I think it is 
safe to skip backporting it.
https://issues.apache.org/jira/browse/HBASE-23198

I'm planning to backport these changes one by one until we reach a state where 
HBASE-26969  can be backported too.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-31 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544550#comment-17544550
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

{quote}
So p1 and p2 can have references to p even when p is no longer in meta?
{quote}

Not exactly. In this scenario p no longer exists in our outside the meta. It is 
not referenced either. The mobfile "_p" exists and is 
referenced from p1 and p2. It was created by p but it exists outside the data 
folder in an entirely different structure under /hbase/mobdir. It is not and 
never was part of p. It was only referenced from p.

 

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27069) Hbase SecureBulkload permission regression

2022-05-31 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544177#comment-17544177
 ] 

Szabolcs Bukros commented on HBASE-27069:
-

You are right, that is a regression. Thanks for the fix!

> Hbase SecureBulkload permission regression
> --
>
> Key: HBASE-27069
> URL: https://issues.apache.org/jira/browse/HBASE-27069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> HBASE-26707 has introduced a bug, where setting the permission of the bulk 
> loaded HFile to 777 is made conditional.
> However, as discussed in HBASE-15790, that permission is essential for 
> HBase's correct operation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-26 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542548#comment-17542548
 ] 

Szabolcs Bukros edited comment on HBASE-26969 at 5/26/22 4:10 PM:
--

{quote}the master cleaner could check only files from regions not online on any 
RS
{quote}
That would not be enough. Consider the following scenario. Region p creates a 
mobfile with a name of "_p". While region p is online the rs 
cleaner can identify that "_p" belongs to this region and can 
clean it up if it is no longer referenced from said region. Now let's split 
region p.  We have 2 new regions p1, p2 and p is archived, maybe even deleted 
altogether. Both p1 and p2 are online and contain references to 
"_p" mobfile but we have no way of knowing we should search 
these regions for references. So the master cleaner have to read every single 
hfile to find the references in p1 and p2.

The mobfiles keep their name until a major compaction runs.

 


was (Author: bszabolcs):
> the master cleaner could check only files from regions not online on any RS

That would not be enough. Consider the following scenario. Region p creates a 
mobfile with a name of "_p". While region p is online the rs 
cleaner can identify that "_p" belongs to this region and can 
clean it up if it is no longer referenced from said region. Now let's split 
region p.  We have 2 new regions p1, p2 and p is archived, maybe even deleted 
altogether. Both p1 and p2 are online and contain references to 
"_p" mobfile but we have no way of knowing we should search 
these regions for references. So the master cleaner have to read every single 
hfile to find the references in p1 and p2.

The mobfiles keep their name until a major compaction runs.

 

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-26 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542548#comment-17542548
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

> the master cleaner could check only files from regions not online on any RS

That would not be enough. Consider the following scenario. Region p creates a 
mobfile with a name of "_p". While region p is online the rs 
cleaner can identify that "_p" belongs to this region and can 
clean it up if it is no longer referenced from said region. Now let's split 
region p.  We have 2 new regions p1, p2 and p is archived, maybe even deleted 
altogether. Both p1 and p2 are online and contain references to 
"_p" mobfile but we have no way of knowing we should search 
these regions for references. So the master cleaner have to read every single 
hfile to find the references in p1 and p2.

The mobfiles keep their name until a major compaction runs.

 

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-23 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541025#comment-17541025
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

[~zhangduo] I'm sorry, I might have misspoken. We do not need an extra SFT. 
Having the references in hfile metadata is sufficient. It's just slow and 
clunky. But it works. I hoped a better way of storing this data could be found, 
but as you have pointed out that is not necessary.

SFT is only linked to this issue, because SFT and removing renames are 
thematically connected, I'm relying on some tools/solutions added to support 
SFT and removing renames makes things more complicated so instead of changing 
the default behavior the idea was to only remove them when SFT, that removed 
the other renames, is enabled.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used

2022-05-23 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros reassigned HBASE-27017:
---

Assignee: Szabolcs Bukros

> MOB snapshot is broken when FileBased SFT is used
> -
>
> Key: HBASE-27017
> URL: https://issues.apache.org/jira/browse/HBASE-27017
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-2
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
>
> During snapshot MOB regions are treated like any other region. When a 
> snapshot is taken and hfile references are collected a StoreFileTracker is 
> created to get the current active hfile list. But the MOB region stores are 
> not tracked so an empty list is returned, resulting in a broken snapshot. 
> When this snapshot is cloned the resulting table will have no MOB files or 
> references.
> The problematic code can be found here:
> [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used

2022-05-23 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541018#comment-17541018
 ] 

Szabolcs Bukros commented on HBASE-27017:
-

[~zhangduo] If you look at it like this, then you are absolutely right :)

> MOB snapshot is broken when FileBased SFT is used
> -
>
> Key: HBASE-27017
> URL: https://issues.apache.org/jira/browse/HBASE-27017
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-2
>Reporter: Szabolcs Bukros
>Priority: Major
>
> During snapshot MOB regions are treated like any other region. When a 
> snapshot is taken and hfile references are collected a StoreFileTracker is 
> created to get the current active hfile list. But the MOB region stores are 
> not tracked so an empty list is returned, resulting in a broken snapshot. 
> When this snapshot is cloned the resulting table will have no MOB files or 
> references.
> The problematic code can be found here:
> [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used

2022-05-23 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541008#comment-17541008
 ] 

Szabolcs Bukros commented on HBASE-27017:
-

[~zhangduo] That would not work well after we remove the renames. It would mean 
the snapshot would also contain any incomplete or broken files too currently in 
the dir. Copying trash around is a non-issue, because we only ever read the 
referenced mob files, but if a snapshot is made during write operation that 
later fails and the file is removed we would end up with a snapshot referencing 
a missing file.

> MOB snapshot is broken when FileBased SFT is used
> -
>
> Key: HBASE-27017
> URL: https://issues.apache.org/jira/browse/HBASE-27017
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-2
>Reporter: Szabolcs Bukros
>Priority: Major
>
> During snapshot MOB regions are treated like any other region. When a 
> snapshot is taken and hfile references are collected a StoreFileTracker is 
> created to get the current active hfile list. But the MOB region stores are 
> not tracked so an empty list is returned, resulting in a broken snapshot. 
> When this snapshot is cloned the resulting table will have no MOB files or 
> references.
> The problematic code can be found here:
> [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-23 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541004#comment-17541004
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

[~wchevreuil] 

> Or is it that we lingering region dirs even after the parent SPLIT/MERGE got 
>removed from META?

Yes. these regions are clean up by a chore and linger after no longer used. But 
this does not really matter. The problem is that when we can not easily 
identify which region should contain references to a given MOB file we have 
absolutely no way to tell and have to read every single hfile's metadata to 
check for references.

[~zhangduo] 

> we do not use SFT at all the MOB regions

That is true. We do not use SFT. But based on similar changes the renames would 
be only eliminated if SFT is enabled, not by default. Also it relies on 
WriterCreationTracker which is mostly an SFT tool.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-16 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537787#comment-17537787
 ] 

Szabolcs Bukros edited comment on HBASE-26969 at 5/16/22 8:45 PM:
--

{quote}I guess if we want to compact the mob files, we always need to compact 
the normal files which references the mob files so we can update the references 
in the metadata?
{quote}
Yes, that is how it works.
{quote}The mob files should have a different name prefix or under a different 
directory?
{quote}
They have a different directory structure. 
"/mobdir/data/default/table_name/a0209c070c85d1e4d500af8ba33c3c02/cf"

They are stored fully separated. Please note these regions only contain mob 
files and are fully independent from the referencing regions. A single mob 
region could theoretically contain every MOB file in hbase regardless of where 
it is referenced from. Also their naming convention is different. For us the 
only important thing is that it ends with "_", so something like this: 
"0cc175b9c0f1b6a831c399e2697726612022050314ecf20b51674cd6bd647bfb2d88b1ff_b593e96e821ba6211d8a4b101a88"
{quote}So at least for loading, there will be no problem
{quote}
That's true. Reads are very straightforward.
{quote}I think the only problem here is how do we clean up the half written mob 
files, I think the logic is mainly the same with what we have now, get all the 
mob refs from all the normal storefiles, to construct the base list, and then 
get all the mob files which are currently being written, all MOB files besides 
them are the ones should be deleted.
{quote}
That is part of the problme, yes. To have access to the half written mob file 
list the cleaner have to run on the RS. But each RS only has access to it's own 
half written mob file list so each can only clean a subset of the existing mob 
files. To be precise if a mob file name ends with a region's name that is 
hosted on the current RS then the cleaner can decide if it can be archived or 
not. Unfortunately with merges and splits regions get archived so after a point 
there will be mob files containing names of regions not hosted on any RS and 
none of the cleaners running on RSes could clean these up. So we need one more 
cleaner specifically for these (I put it on master to replace the original 
cleaner), that have to read every available hfile to make sure we have every 
active mob reference and are able to decide if a mob file created by a since 
archived region can be archived or not.


was (Author: bszabolcs):
{quote}I guess if we want to compact the mob files, we always need to compact 
the normal files which references the mob files so we can update the references 
in the metadata?
{quote}
Yes, that is how it works.
{quote}The mob files should have a different name prefix or under a different 
directory?
{quote}
They have a different directory structure. 
"/mobdir/data/default/table_name/a0209c070c85d1e4d500af8ba33c3c02/cf"

They are stored fully separated. Please note these regions only contain mob 
files and are fully independent from the referencing regions. A single mob 
region could theoretically contain every MOB file in hbase regardless of where 
it is referenced from. Also their naming convention is different. For us the 
only important thing is that it ends with "_", so something like this: 
"0cc175b9c0f1b6a831c399e2697726612022050314ecf20b51674cd6bd647bfb2d88b1ff_b593e96e821ba6211d8a4b101a88"
{quote}So at least for loading, there will be no problem
{quote}
That's true. Reads are very straightforward.
{quote}I think the only problem here is how do we clean up the half written mob 
files, I think the logic is mainly the same with what we have now, get all the 
mob refs from all the normal storefiles, to construct the base list, and then 
get all the mob files which are currently being written, all MOB files besides 
them are the ones should be deleted.
{quote}
That is part of the problme, yes. To have access to the half written mob file 
list the cleaner have to run on the RS. But each RS only has access to it's own 
}}half written mob file list so each can only clean a subset of the existing 
mob files. To be precise if a mob file name ends with a region's name that is 
hosted on the current RS then the cleaner can decide if it can be archived or 
not. Unfortunately with merges and splits regions get archived so after a point 
there will be mob files containing names of regions not hosted on any RS and 
none of the cleaners running on RSes could clean these up. So we need one more 
cleaner specifically for these (I put it on master to replace the original 
cleaner), that have to read every available hfile to make sure we have every 
active mob reference and are able to decide if a mob file created by a since 
archived region can be archived or not.

> Eliminate MOB renames when SFT is enabled
> 

[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-16 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537787#comment-17537787
 ] 

Szabolcs Bukros edited comment on HBASE-26969 at 5/16/22 8:45 PM:
--

{quote}I guess if we want to compact the mob files, we always need to compact 
the normal files which references the mob files so we can update the references 
in the metadata?
{quote}
Yes, that is how it works.
{quote}The mob files should have a different name prefix or under a different 
directory?
{quote}
They have a different directory structure. 
"/mobdir/data/default/table_name/a0209c070c85d1e4d500af8ba33c3c02/cf"

They are stored fully separated. Please note these regions only contain mob 
files and are fully independent from the referencing regions. A single mob 
region could theoretically contain every MOB file in hbase regardless of where 
it is referenced from. Also their naming convention is different. For us the 
only important thing is that it ends with "_", so something like this: 
"0cc175b9c0f1b6a831c399e2697726612022050314ecf20b51674cd6bd647bfb2d88b1ff_b593e96e821ba6211d8a4b101a88"
{quote}So at least for loading, there will be no problem
{quote}
That's true. Reads are very straightforward.
{quote}I think the only problem here is how do we clean up the half written mob 
files, I think the logic is mainly the same with what we have now, get all the 
mob refs from all the normal storefiles, to construct the base list, and then 
get all the mob files which are currently being written, all MOB files besides 
them are the ones should be deleted.
{quote}
That is part of the problme, yes. To have access to the half written mob file 
list the cleaner have to run on the RS. But each RS only has access to it's own 
}}half written mob file list so each can only clean a subset of the existing 
mob files. To be precise if a mob file name ends with a region's name that is 
hosted on the current RS then the cleaner can decide if it can be archived or 
not. Unfortunately with merges and splits regions get archived so after a point 
there will be mob files containing names of regions not hosted on any RS and 
none of the cleaners running on RSes could clean these up. So we need one more 
cleaner specifically for these (I put it on master to replace the original 
cleaner), that have to read every available hfile to make sure we have every 
active mob reference and are able to decide if a mob file created by a since 
archived region can be archived or not.


was (Author: bszabolcs):
{{{quote}}}

I guess if we want to compact the mob files, we always need to compact the 
normal files which references the mob files so we can update the references in 
the metadata?

{quote}

Yes, that is how it works.

{{{quote}}}

The mob files should have a different name prefix or under a different 
directory?

{{{}{quote}{}}}{{{}{}}}

They have a different directory structure. 
"/mobdir/data/default/table_name/a0209c070c85d1e4d500af8ba33c3c02/cf"

They are stored fully separated. Please note these regions only contain mob 
files and are fully independent from the referencing regions. A single mob 
region could theoretically contain every MOB file in hbase regardless of where 
it is referenced from. Also their naming convention is different. For us the 
only important thing is that it ends with "_", so something like this: 
"0cc175b9c0f1b6a831c399e2697726612022050314ecf20b51674cd6bd647bfb2d88b1ff_b593e96e821ba6211d8a4b101a88"

{quote}

So at least for loading, there will be no problem

{quote}

That's true. Reads are very straightforward.

{{{quote}}}

I think the only problem here is how do we clean up the half written mob files, 
I think the logic is mainly the same with what we have now, get all the mob 
refs from all the normal storefiles, to construct the base list, and then get 
all the mob files which are currently being written, all MOB files besides them 
are the ones should be deleted.

{quote}{{{}{}}}

{{That is part of the problme, yes. To have access to the half written mob file 
list the cleaner have to run on the RS. But each RS only has access to it's own 
}}half written mob file list so each can only clean a subset of the existing 
mob files. To be precise if a mob file name ends with a region's name that is 
hosted on the current RS then the cleaner can decide if it can be archived or 
not. Unfortunately with merges and splits regions get archived so after a point 
there will be mob files containing names of regions not hosted on any RS and 
none of the cleaners running on RSes could clean these up. So we need one more 
cleaner specifically for these (I put it on master to replace the original 
cleaner), that have to read every available hfile to make sure we have every 
active mob reference and are able to decide if a mob file created by a since 
archived region can be archived or not.

> Elimi

[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-16 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537787#comment-17537787
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

{{{quote}}}

I guess if we want to compact the mob files, we always need to compact the 
normal files which references the mob files so we can update the references in 
the metadata?

{quote}

Yes, that is how it works.

{{{quote}}}

The mob files should have a different name prefix or under a different 
directory?

{{{}{quote}{}}}{{{}{}}}

They have a different directory structure. 
"/mobdir/data/default/table_name/a0209c070c85d1e4d500af8ba33c3c02/cf"

They are stored fully separated. Please note these regions only contain mob 
files and are fully independent from the referencing regions. A single mob 
region could theoretically contain every MOB file in hbase regardless of where 
it is referenced from. Also their naming convention is different. For us the 
only important thing is that it ends with "_", so something like this: 
"0cc175b9c0f1b6a831c399e2697726612022050314ecf20b51674cd6bd647bfb2d88b1ff_b593e96e821ba6211d8a4b101a88"

{quote}

So at least for loading, there will be no problem

{quote}

That's true. Reads are very straightforward.

{{{quote}}}

I think the only problem here is how do we clean up the half written mob files, 
I think the logic is mainly the same with what we have now, get all the mob 
refs from all the normal storefiles, to construct the base list, and then get 
all the mob files which are currently being written, all MOB files besides them 
are the ones should be deleted.

{quote}{{{}{}}}

{{That is part of the problme, yes. To have access to the half written mob file 
list the cleaner have to run on the RS. But each RS only has access to it's own 
}}half written mob file list so each can only clean a subset of the existing 
mob files. To be precise if a mob file name ends with a region's name that is 
hosted on the current RS then the cleaner can decide if it can be archived or 
not. Unfortunately with merges and splits regions get archived so after a point 
there will be mob files containing names of regions not hosted on any RS and 
none of the cleaners running on RSes could clean these up. So we need one more 
cleaner specifically for these (I put it on master to replace the original 
cleaner), that have to read every available hfile to make sure we have every 
active mob reference and are able to decide if a mob file created by a since 
archived region can be archived or not.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-13 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536589#comment-17536589
 ] 

Szabolcs Bukros edited comment on HBASE-26969 at 5/13/22 11:57 AM:
---

[~zhangduo] thank you for taking time and reading this!
As far as I understand when a storeFile has data that is stored in a mob file 
then that storeFile's metadata will have a reference to these mob files. So 
when a scan request tries to read the data it knows which mob file to check. 
This is the only tracking we have.
{code:java}
storeFile.getMetadataValue(HStoreFile.MOB_FILE_REFS); {code}
For cleanup the chore have to know which mob files are currently actively 
referenced. To get this list, the chore check's the metadata of every single 
storeFile hbase have in a mob enabled CF, and collects the references from 
them. It just iterates through the /data folder table by table.


was (Author: bszabolcs):
[~zhangduo] thank you for taking time and reading this!
As far as I understand when a storeFile has data that is stored in a mob file 
then that storeFile's metadata will have a reference to these mob files. So 
when a scan request tries to read the data it knows which mob file to check. 
This is the only tracking we have.
{code:java}
storeFile.getMetadataValue(HStoreFile.MOB_FILE_REFS); {code}
For cleanup the chore have to know which mob files are currently actively 
referenced. To get this lit, the chore check's the metadata of every single 
storeFile hbase have in a mob enabled CF, and collects the references from 
them. It just iterates through the /data folder table by table.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-13 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536589#comment-17536589
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

[~zhangduo] thank you for taking time and reading this!
As far as I understand when a storeFile has data that is stored in a mob file 
then that storeFile's metadata will have a reference to these mob files. So 
when a scan request tries to read the data it knows which mob file to check. 
This is the only tracking we have.
{code:java}
storeFile.getMetadataValue(HStoreFile.MOB_FILE_REFS); {code}
For cleanup the chore have to know which mob files are currently actively 
referenced. To get this lit, the chore check's the metadata of every single 
storeFile hbase have in a mob enabled CF, and collects the references from 
them. It just iterates through the /data folder table by table.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Sub-task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-10 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534360#comment-17534360
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

I would like to start by stating that this issue grow bigger than just removing 
the renames and exposed multiple issues in the MOB-SFT interaction.

I have uploaded a draft PR containing my changes. I intend to use it as a 
reference to show the issues when it comes to using MOB on FileBased SFT.

My main problem was that while MOB files were already tracked in the hfile 
metadata, the "single source of truth" is widely distributed and not easily 
available.

Both the WriterCreationTracker and the StoreFileTracker are RS based data and 
the MOB cleaner needs it to work reliably when FileBased SFT is used. Exposing 
this data and allowing the Master to request this from RSes, collect it and run 
the cleaner based on this, while technically possible, looked less than 
optimal. It would result in a single cluster wide spike that we should try to 
avoid and considering the delay that certain RSes could have (uneven load, GC 
pauses, etc) the data can be already outdated by the time the collection is 
done. So instead I tried to move the cleaner to the RSes. This solution also 
had it's drawbacks.

MOB file names contain the encoded name of the region that created them so the 
RS hosting that specific region can check it's hfiles for references and can 
clean it up if it does not find anything. The problem comes with merge/split 
parent regions. When the parent region is archived the new region's hfiles will 
still hold references to the old MOB files but now the only way to make sure if 
the old MOB file is referenced or not is to check every single hfile in every 
store belonging to the same columnfamily, because we can not tell based on it's 
name where it could be referenced from. Like the old cleaner did. So while I 
moved the MOB cleaner to the RS level and reduced it's scope to only clean up 
MOB files belonging to regions hosted by that RS I had to leave a "global" MOB 
cleaner running on Master to deal with MOB files created by archived regions 
but potentially still being referenced. And I think this is very ugly.

This whole process could have been significantly simpler if we would have 
tracker files in MOB stores but then we would have TWO competing sources of 
truth. The tracker files and the hfile metadata.

HBASE-27017 is a related issue where the snapshot code tries to get the active 
MOB files based on the configured SFT, but since MOB stores do not have tracker 
files it returns an empty list. If the store had tracker files it would work. 
Without a tracker file we either include every MOB files in the dir (garbage 
included) or scan every single hfile metadata for MOB references.

What I'm trying to say is that while I think my solution would work and solve 
the immediate issues I would much prefer if there would be a centralized, 
easily available active MOB list and create a solution based on that.

[~apurtell] ,[~zhangduo],[~elserj] ,[~wchevreuil] What do you think?

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-09 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533854#comment-17533854
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

Thanks [~apurtell] !

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-09 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533817#comment-17533817
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

[~apurtell] It looks like the MOB feature is currently incompatible with 
FileBased SFT. Without this issue fixed the currently 
written/temporary/outdated/trash files in the store dir can break the 
MobFileCleanerChore and the related issue shows that snapshotting a MOB enabled 
table while FileBased SFT is used results in dataloss. Since 2.5.0 being so 
close to release this fact should be documented somewhere.
I'm planning to add a more detailed description of the issues I have 
encountered while trying to make these features work together as soon as I can 
publish a PR for reference.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used

2022-05-09 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533807#comment-17533807
 ] 

Szabolcs Bukros commented on HBASE-27017:
-

This issue was found while working on HBASE-26969. 
TestMobCompactionWithDefaults uses cloneSnapshot.

> MOB snapshot is broken when FileBased SFT is used
> -
>
> Key: HBASE-27017
> URL: https://issues.apache.org/jira/browse/HBASE-27017
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-2
>Reporter: Szabolcs Bukros
>Priority: Major
>
> During snapshot MOB regions are treated like any other region. When a 
> snapshot is taken and hfile references are collected a StoreFileTracker is 
> created to get the current active hfile list. But the MOB region stores are 
> not tracked so an empty list is returned, resulting in a broken snapshot. 
> When this snapshot is cloned the resulting table will have no MOB files or 
> references.
> The problematic code can be found here:
> [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used

2022-05-09 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-27017:
---

 Summary: MOB snapshot is broken when FileBased SFT is used
 Key: HBASE-27017
 URL: https://issues.apache.org/jira/browse/HBASE-27017
 Project: HBase
  Issue Type: Bug
  Components: mob
Affects Versions: 3.0.0-alpha-2, 2.5.0
Reporter: Szabolcs Bukros


During snapshot MOB regions are treated like any other region. When a snapshot 
is taken and hfile references are collected a StoreFileTracker is created to 
get the current active hfile list. But the MOB region stores are not tracked so 
an empty list is returned, resulting in a broken snapshot. When this snapshot 
is cloned the resulting table will have no MOB files or references.

The problematic code can be found here:

[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-05-03 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531420#comment-17531420
 ] 

Szabolcs Bukros commented on HBASE-26969:
-

[~apurtell] Please bump it.

> Eliminate MOB renames when SFT is enabled
> -
>
> Key: HBASE-26969
> URL: https://issues.apache.org/jira/browse/HBASE-26969
> Project: HBase
>  Issue Type: Task
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> MOB file compaction and flush still relies on renames even when SFT is 
> enabled.
> My proposed changes are:
>  * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
> of using the temp writer we should create a different writer using a 
> {color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
> store folder{color}
>  * {color:#00}these StoreFileWriterCreationTracker should be stored in 
> the MobStore. This would requires us to extend MobStore with a createWriter 
> and a finalizeWriter method to handle this{color}
>  * {color:#00}refactor {color}MobFileCleanerChore to run on the RS 
> instead on Master to allow access to the 
> {color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
> currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26969) Eliminate MOB renames when SFT is enabled

2022-04-22 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-26969:
---

 Summary: Eliminate MOB renames when SFT is enabled
 Key: HBASE-26969
 URL: https://issues.apache.org/jira/browse/HBASE-26969
 Project: HBase
  Issue Type: Task
  Components: mob
Affects Versions: 2.5.0, 3.0.0-alpha-3
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


MOB file compaction and flush still relies on renames even when SFT is enabled.

My proposed changes are:
 * when requireWritingToTmpDirFirst is false during mob flush/compact instead 
of using the temp writer we should create a different writer using a 
{color:#00}StoreFileWriterCreationTracker that writes directly to the mob 
store folder{color}
 * {color:#00}these StoreFileWriterCreationTracker should be stored in the 
MobStore. This would requires us to extend MobStore with a createWriter and a 
finalizeWriter method to handle this{color}
 * {color:#00}refactor {color}MobFileCleanerChore to run on the RS instead 
on Master to allow access to the 
{color:#00}StoreFileWriterCreationTracker{color}s to make sure the 
currently written files are not cleaned up



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26791) Memstore flush fencing issue for SFT

2022-03-02 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-26791:
---

 Summary: Memstore flush fencing issue for SFT
 Key: HBASE-26791
 URL: https://issues.apache.org/jira/browse/HBASE-26791
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.6.0, 3.0.0-alpha-3
Reporter: Szabolcs Bukros


The scenarios is the following:
 # rs1 is flushing file to S3 for region1
 # rs1 loses ZK lock
 # region1 gets assigned to rs2
 # rs2 opens region1
 # rs1 completes flush and updates sft file for region1
 # rs2 has a different “version” of the sft file for region1

The flush should fail at the end, but the SFT file gets overwritten before 
that, resulting in potential data loss.

 

Potential solutions include:
 * Adding timestamp to the tracker file names. This and creating a new tracker 
file when an rs open the region would allow us to list available tracker files 
before an update and compare the found timestamps to the one stored in memory 
to verify the store still owns the latest tracker file
 * Using the existing timestamp in the tracker file content. This would also 
require us to create a new tracker file when a new rs opens the region, but 
instead of listing the available tracker files, we could try to load and 
de-serialize the last tracker file and compare the timestamp found in it to the 
one stored in memory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26707) Reduce number of renames during bulkload

2022-02-22 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495985#comment-17495985
 ] 

Szabolcs Bukros commented on HBASE-26707:
-

[~wchevreuil] Thanks a lot for your feedback and commit. Please find the 
branch-2 compatible PR here: 
https://github.com/apache/hbase/pull/4122

> Reduce number of renames during bulkload
> 
>
> Key: HBASE-26707
> URL: https://issues.apache.org/jira/browse/HBASE-26707
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
>
> Make sure we only do a single rename operation during bulkload when 
> StoreEngine does not require the the use of tmp directories.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26624) [hbase-operator-tools] Introduce a HBCK2 tool to fix the store file tracking

2022-02-07 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488135#comment-17488135
 ] 

Szabolcs Bukros commented on HBASE-26624:
-

Hi  [~zhangduo] 

Do you expect this to be part of HBCK2 or a standalone operator tool like 
RegionsMerger would be sufficient?

Also what kind of granularity are you looking for? Having a tool that could 
re-generate the tracker files globally or for a selected table would be enough 
or should we go down to the region level?

> [hbase-operator-tools] Introduce a HBCK2 tool to fix the store file tracking
> 
>
> Key: HBASE-26624
> URL: https://issues.apache.org/jira/browse/HBASE-26624
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: Szabolcs Bukros
>Priority: Major
>
> We should provide a HBCK2 tool to recover the store file tracking if it is 
> broken.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HBASE-26624) [hbase-operator-tools] Introduce a HBCK2 tool to fix the store file tracking

2022-01-26 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros reassigned HBASE-26624:
---

Assignee: Szabolcs Bukros

> [hbase-operator-tools] Introduce a HBCK2 tool to fix the store file tracking
> 
>
> Key: HBASE-26624
> URL: https://issues.apache.org/jira/browse/HBASE-26624
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: Szabolcs Bukros
>Priority: Major
>
> We should provide a HBCK2 tool to recover the store file tracking if it is 
> broken.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HBASE-26707) Reduce number of renames during bulkload

2022-01-26 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-26707 started by Szabolcs Bukros.
---
> Reduce number of renames during bulkload
> 
>
> Key: HBASE-26707
> URL: https://issues.apache.org/jira/browse/HBASE-26707
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
>
> Make sure we only do a single rename operation during bulkload when 
> StoreEngine does not require the the use of tmp directories.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26707) Reduce number of renames during bulkload

2022-01-26 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482502#comment-17482502
 ] 

Szabolcs Bukros commented on HBASE-26707:
-

During implementation I have found an issue with 
bulkLoadListener.failedBulkLoad at 
[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L7320]

The passed param is the staged path, but the method expects the file's original 
location. This could lead to leaving the hfile in the staging dir after failing 
a bulkload and because cleanup deletes staging loosing the hfile.

This is also fixed in the attached PR.

> Reduce number of renames during bulkload
> 
>
> Key: HBASE-26707
> URL: https://issues.apache.org/jira/browse/HBASE-26707
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
>
> Make sure we only do a single rename operation during bulkload when 
> StoreEngine does not require the the use of tmp directories.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26707) Reduce number of renames during bulkload

2022-01-25 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-26707:
---

 Summary: Reduce number of renames during bulkload
 Key: HBASE-26707
 URL: https://issues.apache.org/jira/browse/HBASE-26707
 Project: HBase
  Issue Type: Sub-task
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


Make sure we only do a single rename operation during bulkload when StoreEngine 
does not require the the use of tmp directories.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26441) Add metrics for BrokenStoreFileCleaner

2021-11-10 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441874#comment-17441874
 ] 

Szabolcs Bukros commented on HBASE-26441:
-

[~zhangduo] I would like to go back to my original Cleaner Chore PR and re-use 
the metrics solution from there and match it to the finalized chore.

> Add metrics for BrokenStoreFileCleaner
> --
>
> Key: HBASE-26441
> URL: https://issues.apache.org/jira/browse/HBASE-26441
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Minor
>
> This is a followup for HBASE-26271.
> Cleaner chores lacking visibility is returning issue so I would like to add 
> metrics for BrokenStoreFileCleaner to have a better idea of the tasks it 
> performs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26441) Add metrics for BrokenStoreFileCleaner

2021-11-10 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-26441:
---

 Summary: Add metrics for BrokenStoreFileCleaner
 Key: HBASE-26441
 URL: https://issues.apache.org/jira/browse/HBASE-26441
 Project: HBase
  Issue Type: Sub-task
  Components: metrics
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


This is a followup for HBASE-26271.
Cleaner chores lacking visibility is returning issue so I would like to add 
metrics for BrokenStoreFileCleaner to have a better idea of the tasks it 
performs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26271) Cleanup the broken store files under data directory

2021-11-10 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441867#comment-17441867
 ] 

Szabolcs Bukros commented on HBASE-26271:
-

[~zhangduo]  Thanks for all the feedback and merging it.

> Cleanup the broken store files under data directory
> ---
>
> Key: HBASE-26271
> URL: https://issues.apache.org/jira/browse/HBASE-26271
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile
>Reporter: Duo Zhang
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: HBASE-26067
>
>
> As for some new store file tracker implementation, we allow flush/compaction 
> to write directly to data directory, so if we crash in the middle, there will 
> be broken store files left in the data directory.
> We should find a proper way to delete these broken files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot

2021-11-08 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440521#comment-17440521
 ] 

Szabolcs Bukros edited comment on HBASE-26286 at 11/8/21, 2:52 PM:
---

[~zhangduo]

{quote}
IIRC, our decision on HBASE-26280 is that, a snapshot will be constructed by 
plain HFiles, you always need to list the directory to get all the HFiles, so 
I'm a bit confusing that why here we say 'snapshot with file based SFT'. Did I 
miss something?
{quote}

If I understand correctly that discussion was about tracker files and concluded 
in not adding them because the list of available hfiles in the snapshot will 
always be the full and correct list of storefiles so the tracker files can be 
rebuilt if necessary. The TableDescriptor however can contain SFT config, both 
on table and cf level if the global SFT config was overridden.

{quote}
Could someone explain them for me? It seems that one of them is for creating a 
new table and another one is for performing on an existing table?
{quote}

Clone creates a new table with the provided name and the TableDescriptor from 
the snapshot metadata. So we can freely change the SFT implementation we would 
like to use, because we can just override the TableDescriptor and the new table 
will be created with it.

Restore, tries to restore the state of an existing table to match the snapshot. 
To achieve this it deletes regions and/or hfiles present in the current table 
but not present in the snapshot, copies regions and/or hfiles missing to the 
current table from the snapshot and most importantly for us it simply 
overwrites the current TableDesriptor with the one from the snapshot.

This last step causes the problems.
 * Consider a usecase where a cf uses file based SFT at the time of snapshot, 
while the global config is still the default SFT. Later on we migrate the cf 
back to the default SFT. Then we have to restore the snapshot. The process 
overwrites the TableDescriptor with the one from snapshot and suddenly the cf 
will try to use file based SFT (since it used that before the snapshot) but 
because there is no actual SFT migration as part of the restore process the cf 
folder does not have tracking files and SFT fails. This is a bug in the current 
implementation.
 * Specifying the SFT for restore has it's own issues. Consider a usecase where 
the global SFT config uses the default. We restore a table and specify we would 
like to use file based SFT instead. There will be regions that exists in the 
current table and existed at the time of the snapshot too. A few hfiles might 
get added/deleted, but otherwise they remain untouched. Forcefully setting the 
SFT to file based as specified is possible, but there is no logic that would do 
the migration and build the tracker files, so the SFT would fail. Similarly 
switching back to default (from a file based SFT) is possible but restore lacks 
the logic to clean up the tracker files.

We have multiple options here:
 # As [~wchevreuil] suggested we could add a check that stops the restore 
process if there would be an SFT incompatibility and would prompt the user to 
manually migrate the problematic sections first. This has the advantage of 
keeping the restore logic clean and making an SFT change a more conscious 
decision. But has the downside of being a potentially labor intensive manual 
process.
 # We could use the SFT implementation param we are currently introducing to 
signal which implementation we would *prefer* to use. When there is a conflict 
in the current and snapshot SFT config, if the currently used implementation 
matches the SFT param, we can override the snapshot config. This is basically a 
bit more flexible variation of the 1. point. It would help the user move 
towards a selected SFT while keeping the restore logic clean.
 # We could add the SFT migration logic to restore and simply add the tracking 
files when needed or clean them up when we move away from file based SFT. It 
has the upside of being the most user friendly solution, but it has the 
downside of mixing restore logic with SFT logic.
 # We could extend the SFT implementations to "auto migrate" meaning clean up 
after themselves and prepare necessary files for themselves. This would allow 
restore to just override the TableDescriptor any way it wants and let SFT deal 
with the required steps.


was (Author: bszabolcs):
[~zhangduo] 

{quote}
IIRC, our decision on HBASE-26280 is that, a snapshot will be constructed by 
plain HFiles, you always need to list the directory to get all the HFiles, so 
I'm a bit confusing that why here we say 'snapshot with file based SFT'. Did I 
miss something?\{quote}

If I understand correctly that discussion was about tracker files and concluded 
in not adding them because the list of available hfiles in the snapshot will 
always be the full and correct list of st

[jira] [Commented] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot

2021-11-08 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440521#comment-17440521
 ] 

Szabolcs Bukros commented on HBASE-26286:
-

[~zhangduo] 

{quote}
IIRC, our decision on HBASE-26280 is that, a snapshot will be constructed by 
plain HFiles, you always need to list the directory to get all the HFiles, so 
I'm a bit confusing that why here we say 'snapshot with file based SFT'. Did I 
miss something?\{quote}

If I understand correctly that discussion was about tracker files and concluded 
in not adding them because the list of available hfiles in the snapshot will 
always be the full and correct list of storefiles so the tracker files can be 
rebuilt if necessary. The TableDescriptor however can contain SFT config, both 
on table and cf level if the global SFT config was overridden.

 

{quote}
Could someone explain them for me? It seems that one of them is for creating a 
new table and another one is for performing on an existing table?
{quote}

Clone creates a new table with the provided name and the TableDescriptor from 
the snapshot metadata. So we can freely change the SFT implementation we would 
like to use, because we can just override the TableDescriptor and the new table 
will be created with it.

Restore, tries to restore the state of an existing table to match the snapshot. 
To achieve this it deletes regions and/or hfiles present in the current table 
but not present in the snapshot, copies regions and/or hfiles missing to the 
current table from the snapshot and most importantly for us it simply 
overwrites the current TableDesriptor with the one from the snapshot.

This last step causes the problems.
 * Consider a usecase where a cf uses file based SFT at the time of snapshot, 
while the global config is still the default SFT. Later on we migrate the cf 
back to the default SFT. Then we have to restore the snapshot. The process 
overwrites the TableDescriptor with the one from snapshot and suddenly the cf 
will try to use file based SFT (since it used that before the snapshot) but 
because there is no actual SFT migration as part of the restore process the cf 
folder does not have tracking files and SFT fails. This is a bug in the current 
implementation.
 * Specifying the SFT for restore has it's own issues. Consider a usecase where 
the global SFT config uses the default. We restore a table and specify we would 
like to use file based SFT instead. There will be regions that exists in the 
current table and existed at the time of the snapshot too. A few hfiles might 
get added/deleted, but otherwise they remain untouched. Forcefully setting the 
SFT to file based as specified is possible, but there is no logic that would do 
the migration and build the tracker files, so the SFT would fail. Similarly 
switching back to default (from a file based SFT) is possible but restore lacks 
the logic to clean up the tracker files.

We have multiple options here:
 # As [~wchevreuil] suggested we could add a check that stops the restore 
process if there would be an SFT incompatibility and would prompt the user to 
manually migrate the problematic sections first. This has the advantage of 
keeping the restore logic clean and making an SFT change a more conscious 
decision. But has the downside of being a potentially labor intensive manual 
process.
 # We could use the SFT implementation param we are currently introducing to 
signal which implementation we would *prefer* to use. When there is a conflict 
in the current and snapshot SFT config, if the currently used implementation 
matches the SFT param, we can override the snapshot config. This is basically a 
bit more flexible variation of the 1. point. It would help the user move 
towards a selected SFT while keeping the restore logic clean.
 # We could add the SFT migration logic to restore and simply add the tracking 
files when needed or clean them up when we move away from file based SFT. It 
has the upside of being the most user friendly solution, but it has the 
downside of mixing restore logic with SFT logic.
 # We could extend the SFT implementations to "auto migrate" meaning clean up 
after themselves and prepare necessary files for themselves. This would allow 
restore to just override the TableDescriptor any way it wants and let SFT deal 
with the required steps.

> Add support for specifying store file tracker when restoring or cloning 
> snapshot
> 
>
> Key: HBASE-26286
> URL: https://issues.apache.org/jira/browse/HBASE-26286
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile, snapshots
>Reporter: Duo Zhang
>Assignee: Szabolcs Bukros
>Priority: Major
>
> As discussed in HBASE-26280.
> https://issues.apache.org/jira/browse/HBA

[jira] [Work started] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot

2021-11-02 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-26286 started by Szabolcs Bukros.
---
> Add support for specifying store file tracker when restoring or cloning 
> snapshot
> 
>
> Key: HBASE-26286
> URL: https://issues.apache.org/jira/browse/HBASE-26286
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile, snapshots
>Reporter: Duo Zhang
>Assignee: Szabolcs Bukros
>Priority: Major
>
> As discussed in HBASE-26280.
> https://issues.apache.org/jira/browse/HBASE-26280?focusedCommentId=17414894&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17414894



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-26271) Cleanup the broken store files under data directory

2021-11-02 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-26271 started by Szabolcs Bukros.
---
> Cleanup the broken store files under data directory
> ---
>
> Key: HBASE-26271
> URL: https://issues.apache.org/jira/browse/HBASE-26271
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile
>Reporter: Duo Zhang
>Assignee: Szabolcs Bukros
>Priority: Major
>
> As for some new store file tracker implementation, we allow flush/compaction 
> to write directly to data directory, so if we crash in the middle, there will 
> be broken store files left in the data directory.
> We should find a proper way to delete these broken files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot

2021-10-29 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436018#comment-17436018
 ] 

Szabolcs Bukros commented on HBASE-26286:
-

[~wchevreuil] Great point. I'll have to check how the different SFTs would 
handle this.

> Add support for specifying store file tracker when restoring or cloning 
> snapshot
> 
>
> Key: HBASE-26286
> URL: https://issues.apache.org/jira/browse/HBASE-26286
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile, snapshots
>Reporter: Duo Zhang
>Assignee: Szabolcs Bukros
>Priority: Major
>
> As discussed in HBASE-26280.
> https://issues.apache.org/jira/browse/HBASE-26280?focusedCommentId=17414894&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17414894



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26286) Add support for specifying store file tracker when restoring or cloning snapshot

2021-10-29 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435979#comment-17435979
 ] 

Szabolcs Bukros commented on HBASE-26286:
-

[~zhangduo], [~wchevreuil], [~elserj] 

After checking the code, making *cloning* SFT configurable looks 
straightforward enough. We can freely overwrite the Descriptors and use the new 
SFT impl table wide. However the same is not true for *restore*. The store 
configuration StoreEngine and SFT impl is based on is a composite of 3 sources: 
master conf, TableDescriptor, ColumnFamilyDescriptor. We can not change any of 
these without potentially affecting otherwise untouched stores and my 
assumption is that we should avoid that.

My suggestion would be to drop restore from the scope. Because if changing 
otherwise untouched regions should be avoided than our Descriptor granularity 
is insufficient for this task. If changing regions untouched by restore is 
acceptable, I would argue doing a traditional restore and using the already 
existing migration logic is a cleaner solution than mixing it with snapshot 
restore.

Am I missing something? What do you think?

> Add support for specifying store file tracker when restoring or cloning 
> snapshot
> 
>
> Key: HBASE-26286
> URL: https://issues.apache.org/jira/browse/HBASE-26286
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile, snapshots
>Reporter: Duo Zhang
>Assignee: Szabolcs Bukros
>Priority: Major
>
> As discussed in HBASE-26280.
> https://issues.apache.org/jira/browse/HBASE-26280?focusedCommentId=17414894&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17414894



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26271) Cleanup the broken store files under data directory

2021-10-13 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428394#comment-17428394
 ] 

Szabolcs Bukros commented on HBASE-26271:
-

[~zhangduo] prepared a new PR, on the correct branch this time. Also added 
metrics for the chore and a REST endpoint to easily access those metrics. Could 
you please take a look? [Please find it 
here.|https://github.com/apache/hbase/pull/3751]

> Cleanup the broken store files under data directory
> ---
>
> Key: HBASE-26271
> URL: https://issues.apache.org/jira/browse/HBASE-26271
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile
>Reporter: Duo Zhang
>Assignee: Szabolcs Bukros
>Priority: Major
>
> As for some new store file tracker implementation, we allow flush/compaction 
> to write directly to data directory, so if we crash in the middle, there will 
> be broken store files left in the data directory.
> We should find a proper way to delete these broken files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26271) Cleanup the broken store files under data directory

2021-09-29 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422202#comment-17422202
 ] 

Szabolcs Bukros commented on HBASE-26271:
-

[~zhangduo] Created a PR with my initial changes. Could you please take a look?

[~wchevreuil] suggested that if we could list the currently written compaction 
targets we could add that to the check and should not solely rely on the 
leftover file ttl to prevent breaking a long-running compaction. I have not 
found a nice way to add a generic implementation for this so my initial 
solution is limited to the {color:#00}DirectStoreCompactor. What do you 
think?
{color}

{color:#00}I'm planning to add metrics to get a clearer idea of chore 
performance/results and extend the api to get this info in a followup 
commit.{color}

> Cleanup the broken store files under data directory
> ---
>
> Key: HBASE-26271
> URL: https://issues.apache.org/jira/browse/HBASE-26271
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile
>Reporter: Duo Zhang
>Priority: Major
>
> As for some new store file tracker implementation, we allow flush/compaction 
> to write directly to data directory, so if we crash in the middle, there will 
> be broken store files left in the data directory.
> We should find a proper way to delete these broken files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26271) Cleanup the broken store files under data directory

2021-09-21 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418169#comment-17418169
 ] 

Szabolcs Bukros commented on HBASE-26271:
-

[~zhangduo], [~elserj] 

1. The approach I'm testing now would be to add a ScheduledChore on region 
start that periodically checks the hfiles for each store where persistent  
storage is enabled. Maybe allowing this to be called from shell too. I think it 
is safer this way. We can run it rarely enough to minimize the performance 
impact but make sure to keep the folder clean.
2. I would check the ModificationTime and add a massive waiting period. The 
period can be big enough to be safely outside a realistic compaction runtime, 
since we are in no hurry to archive these files. I'm not sure a more 
complicated solution is warranted.
" we could still fail before inserting these files into store file tracker 
right?" My thoughts exactly. The safest solution seems to be just listing the 
file system. Also considering this is rs specific and we can add some jitter 
the impact should not be significant either.

What do you think?

> Cleanup the broken store files under data directory
> ---
>
> Key: HBASE-26271
> URL: https://issues.apache.org/jira/browse/HBASE-26271
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile
>Reporter: Duo Zhang
>Priority: Major
>
> As for some new store file tracker implementation, we allow flush/compaction 
> to write directly to data directory, so if we crash in the middle, there will 
> be broken store files left in the data directory.
> We should find a proper way to delete these broken files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25394) Support Snapshot related operation with direct insert HFiles into data/CF directory

2021-07-16 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros reassigned HBASE-25394:
---

Assignee: Szabolcs Bukros

> Support Snapshot related operation with direct insert HFiles into data/CF 
> directory
> ---
>
> Key: HBASE-25394
> URL: https://issues.apache.org/jira/browse/HBASE-25394
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Tak-Lon (Stephen) Wu
>Assignee: Szabolcs Bukros
>Priority: Major
>
> {color:#00}Support restore snapshot, clone snapshot with direct insert 
> into data directory{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25964) [HBOSS] Introducing hbase metrics to Hboss

2021-06-02 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-25964:
---

 Summary: [HBOSS] Introducing hbase metrics to Hboss
 Key: HBASE-25964
 URL: https://issues.apache.org/jira/browse/HBASE-25964
 Project: HBase
  Issue Type: Improvement
  Components: hboss
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros
 Fix For: hbase-filesystem-1.0.0-alpha2


I would like to introduce hbase metrics to Hboss to allow closer monitoring of 
rename performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24720) Meta replicas not cleaned when disabled

2020-07-14 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157570#comment-17157570
 ] 

Szabolcs Bukros commented on HBASE-24720:
-

Thanks for the merge and review [~psomogyi] !

> Meta replicas not cleaned when disabled
> ---
>
> Key: HBASE-24720
> URL: https://issues.apache.org/jira/browse/HBASE-24720
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.6
>
>
> The assignMetaReplicas method works kinda like this:
> {code:java}
> void assignMetaReplicas(){
> if (numReplicas <= 1) return;
> //create if needed then assign meta replicas
> unassignExcessMetaReplica(numReplicas);
> }
> {code}
> Now this unassignExcessMetaReplica method is the one that gets rid of the 
> replicas we no longer need. It closes them and deletes their zNode.  
> Unfortunately this only happens if we decreased the replica number. If we 
> disabled it, by setting the replica number to 1 assignMetaReplicas returns 
> instantly without cleaning up the no longer needed replicas resulting in 
> replicas lingering around.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24720) Meta replicas not cleaned when disabled

2020-07-13 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-24720:
---

 Summary: Meta replicas not cleaned when disabled
 Key: HBASE-24720
 URL: https://issues.apache.org/jira/browse/HBASE-24720
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Affects Versions: 2.2.5, 3.0.0-alpha-1, 2.3.0, 2.4.0
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


The assignMetaReplicas method works kinda like this:
{code:java}
void assignMetaReplicas(){
if (numReplicas <= 1) return;

//create if needed then assign meta replicas

unassignExcessMetaReplica(numReplicas);
}
{code}
Now this unassignExcessMetaReplica method is the one that gets rid of the 
replicas we no longer need. It closes them and deletes their zNode.  
Unfortunately this only happens if we decreased the replica number. If we 
disabled it, by setting the replica number to 1 assignMetaReplicas returns 
instantly without cleaning up the no longer needed replicas resulting in 
replicas lingering around.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24562) Stabilize master startup with meta replicas enabled

2020-06-29 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147919#comment-17147919
 ] 

Szabolcs Bukros commented on HBASE-24562:
-

Thanks for the merges [~wchevreuil] ! Please find the branch-2.2 compatible PR 
here:

https://github.com/apache/hbase/pull/1997

> Stabilize master startup with meta replicas enabled
> ---
>
> Key: HBASE-24562
> URL: https://issues.apache.org/jira/browse/HBASE-24562
> Project: HBase
>  Issue Type: Improvement
>  Components: meta, read replicas
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0
>
>
> This is related to HBASE-21624 . 
> I created a separate ticket because in the original one a "complete solution 
> for meta replicas" was requested and this is not one. I'm just trying to make 
> master startup more stable by making assigning meta replicas asynchronous and 
> preventing a potential assignment failure from crashing master.
> The idea is that starting master with less or even no meta replicas assigned 
> is preferable to not having a running master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24579) Failed SASL authentication does not result in an exception on client side

2020-06-22 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141823#comment-17141823
 ] 

Szabolcs Bukros commented on HBASE-24579:
-

Thanks a lot for the commits [~wchevreuil] ! Please find the branch-2.2 PR 
here: https://github.com/apache/hbase/pull/1951

> Failed SASL authentication does not result in an exception on client side
> -
>
> Key: HBASE-24579
> URL: https://issues.apache.org/jira/browse/HBASE-24579
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0
>
>
> When HBaseSaslRpcClient.saslConnect tries to authenticate it only reads the 
> input stream if the process is not complete yet. However if the 
> authentication failed and the process is completed the exception sent back in 
> the stream never gets read.
> We should always try to read the input stream even if the process is complete 
> to make sure it was empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-24562) Stabilize master startup with meta replicas enabled

2020-06-17 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-24562 started by Szabolcs Bukros.
---
> Stabilize master startup with meta replicas enabled
> ---
>
> Key: HBASE-24562
> URL: https://issues.apache.org/jira/browse/HBASE-24562
> Project: HBase
>  Issue Type: Improvement
>  Components: meta, read replicas
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
>
> This is related to HBASE-21624 . 
> I created a separate ticket because in the original one a "complete solution 
> for meta replicas" was requested and this is not one. I'm just trying to make 
> master startup more stable by making assigning meta replicas asynchronous and 
> preventing a potential assignment failure from crashing master.
> The idea is that starting master with less or even no meta replicas assigned 
> is preferable to not having a running master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-24579) Failed SASL authentication does not result in an exception on client side

2020-06-17 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-24579 started by Szabolcs Bukros.
---
> Failed SASL authentication does not result in an exception on client side
> -
>
> Key: HBASE-24579
> URL: https://issues.apache.org/jira/browse/HBASE-24579
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
>
> When HBaseSaslRpcClient.saslConnect tries to authenticate it only reads the 
> input stream if the process is not complete yet. However if the 
> authentication failed and the process is completed the exception sent back in 
> the stream never gets read.
> We should always try to read the input stream even if the process is complete 
> to make sure it was empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24579) Failed SASL authentication does not result in an exception on client side

2020-06-17 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-24579:
---

 Summary: Failed SASL authentication does not result in an 
exception on client side
 Key: HBASE-24579
 URL: https://issues.apache.org/jira/browse/HBASE-24579
 Project: HBase
  Issue Type: Bug
  Components: rpc
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


When HBaseSaslRpcClient.saslConnect tries to authenticate it only reads the 
input stream if the process is not complete yet. However if the authentication 
failed and the process is completed the exception sent back in the stream never 
gets read.

We should always try to read the input stream even if the process is complete 
to make sure it was empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24562) Stabilize master startup with meta replicas enabled

2020-06-15 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-24562:
---

 Summary: Stabilize master startup with meta replicas enabled
 Key: HBASE-24562
 URL: https://issues.apache.org/jira/browse/HBASE-24562
 Project: HBase
  Issue Type: Improvement
  Components: meta, read replicas
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


This is related to HBASE-21624 . 

I created a separate ticket because in the original one a "complete solution 
for meta replicas" was requested and this is not one. I'm just trying to make 
master startup more stable by making assigning meta replicas asynchronous and 
preventing a potential assignment failure from crashing master.

The idea is that starting master with less or even no meta replicas assigned is 
preferable to not having a running master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24186) RegionMover ignores replicationId

2020-05-19 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110949#comment-17110949
 ] 

Szabolcs Bukros commented on HBASE-24186:
-

My change relies on HBASE-21753 and it was not backported to branch-2.1 . 
Thanks [~ram_krish] for the revert.

> RegionMover ignores replicationId
> -
>
> Key: HBASE-24186
> URL: https://issues.apache.org/jira/browse/HBASE-24186
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.5
>
>
> When RegionMover looks up which rs hosts a region, it does this based on 
> startRowKey. When read replication is enabled this might not return the 
> expected region's data and this can prevent the moving of these regions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24186) RegionMover ignores replicationId

2020-04-15 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084226#comment-17084226
 ] 

Szabolcs Bukros commented on HBASE-24186:
-

Correction:

This does not prevent the moving of the region, the result of 
getServerNameForRegion() is only used in the validation of the move, so it only 
forces the move to try to repeat itself because it does not realize the move 
already happened. So it just slows down the process but not break it.

> RegionMover ignores replicationId
> -
>
> Key: HBASE-24186
> URL: https://issues.apache.org/jira/browse/HBASE-24186
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: master
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Minor
>
> When RegionMover looks up which rs hosts a region, it does this based on 
> startRowKey. When read replication is enabled this might not return the 
> expected region's data and this can prevent the moving of these regions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24186) RegionMover ignores replicationId

2020-04-14 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-24186:
---

 Summary: RegionMover ignores replicationId
 Key: HBASE-24186
 URL: https://issues.apache.org/jira/browse/HBASE-24186
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Affects Versions: master
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


When RegionMover looks up which rs hosts a region, it does this based on 
startRowKey. When read replication is enabled this might not return the 
expected region's data and this can prevent the moving of these regions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot

2020-03-26 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067867#comment-17067867
 ] 

Szabolcs Bukros commented on HBASE-23995:
-

The logs are from 2.0.

I'm reasonably certain. I see in the 2.0 logs that the manifest is created 
while compaction is running and before it could have finished writing to the 
temporary hfile. Thanks to this the manifest would refer to the hfile 
references. While in 2.2 where the snapshot runs after the compaction, it 
refers tot he freshly created storefiles.

> Snapshoting a splitting region results in corrupted snapshot
> 
>
> Key: HBASE-23995
> URL: https://issues.apache.org/jira/browse/HBASE-23995
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.0.2
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The problem seems to originate from the fact that while the region split 
> itself runs in a lock, the compactions following it run in separate threads. 
> Alternatively the use of space quota policies can prevent compaction after a 
> split and leads to the same issue.
> In both cases the resulting snapshot will keep the split status of the parent 
> region, but do not keep the references to the daughter regions, because they 
> (splitA, splitB qualifiers) are stored separately in the meta table and do 
> not propagate with the snapshot.
> This is important because the in the freshly cloned table CatalogJanitor will 
> find the parent region, realizes it is in split state, but because it can not 
> find the daughter region references (haven't propagated) assumes parent could 
> be cleaned up and deletes it. The archived region used in the snaphost only 
> has back reference to the now also archived parent region and if the snapshot 
> is deleted they both gets cleaned up. Unfortunately the daughter regions only 
> contains hfile links, so at this point the data is lost.
> How to reproduce:
> {code:java}
> hbase shell < create 'test', 'cf'
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> flush 'test'
> split 'test'
> snapshot 'test', 'testshot'
> EOF
> {code}
> This should make sure the snapshot is made before the compaction could be 
> finished even with small amount of data.
> {code:java}
> sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
> testshot -copy-to hdfs://target:8020/apps/hbase/data/
> {code}
> I export the snapshot to make the usecase cleaner but deleting both the 
> snapshot and the original table after the cloning should have the same effect.
> {code:java}
> clone_snapshot 'testshot', 'test2'
> delete_snapshot "testshot"
> {code}
> I'm not sure what would be the best way to fix this. Preventing snapshots 
> when a region is in split state, would make snapshot creation problematic. 
> Forcing to run compaction as part of the split thread would make it rather 
> slow. Propagating the daughter region references could prevent the deletion 
> of the cloned parent region and the data would not be broken anymore but I'm 
> not sure we have a logic in place that could pick up the pieces and finish 
> the split process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot

2020-03-26 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067784#comment-17067784
 ] 

Szabolcs Bukros commented on HBASE-23995:
-

As Josh mentioned both Split and Snapshot uses PV2 so it should work. And since 
in 2.2 it does work I started to check commits missing from the old branch. 
HBASE-21375 looked promising, while it does not target this behavior it looked 
like a general improvement on the locking logic. Quickly backported and 
re-tested it, but unfortunately it does not solve the issue.

Now that I know what to look for I could find in the log the point where the 
lock is passed from Split to Snapshot (hbase-master.log).

 
{code:java}
2020-03-26 14:32:31,945 INFO  [PEWorker-8] procedure2.ProcedureExecutor: 
Finished pid=28, state=SUCCESS; SplitTableRegionProcedure table=tab2, 
parent=11544264d3485f5ff700562ca6b62acb, daughterA
=dcf89acf08c55f494fd93ceedd3f3445, daughterB=bf84f2e23131d9488d9c56117d374187 
in 1.0010sec
2020-03-26 14:32:31,946 DEBUG [PEWorker-8] locking.LockProcedure: LOCKED 
pid=30, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, 
tableName=tab2, type=EXCLUSIVE
2020-03-26 14:32:31,948 INFO  [PEWorker-8] procedure2.TimeoutExecutorThread: 
ADDED pid=30, state=WAITING_TIMEOUT, locked=true; 
org.apache.hadoop.hbase.master.locking.LockProcedure, tableName=ta
b2, type=EXCLUSIVE; timeout=60, timestamp=1585233751948
2020-03-26 14:32:31,948 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] 
snapshot.SnapshotManager: Started snapshot: { ss=tabshot2 table=tab2 type=FLUSH 
}
{code}
Curiously in the rs log I can see PostOpenDeployTasks and compactions starting 
to run while SplitTableRegionProcedure has the lock
{code:java}
2020-03-26 14:32:31,918 INFO  
[PostOpenDeployTasks:dcf89acf08c55f494fd93ceedd3f3445] 
regionserver.HRegionServer: Post open deploy tasks for 
tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445.
2020-03-26 14:32:31,919 DEBUG 
[PostOpenDeployTasks:dcf89acf08c55f494fd93ceedd3f3445] 
regionserver.CompactSplit: Small Compaction requested: system; Because: Opening 
Region; compactionQueue=(longCompactions=0:shortCompactions=0), splitQueue=0
2020-03-26 14:32:31,921 DEBUG 
[regionserver/c2504-node4:16020-longCompactions-1585218367783] 
compactions.SortedCompactionPolicy: Selecting compaction from 1 store files, 0 
compacting, 1 eligible, 100 blocking
2020-03-26 14:32:31,922 DEBUG 
[regionserver/c2504-node4:16020-longCompactions-1585218367783] 
regionserver.HStore: dcf89acf08c55f494fd93ceedd3f3445 - cf: Initiating minor 
compaction (all files)
{code}
And it only finishes at around the same time snapshot is finishing:
{code:java}
  2020-03-26 14:32:32,088 INFO  
[regionserver/c2504-node4:16020-longCompactions-1585218367783] 
regionserver.CompactSplit: Completed compaction 
region=tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445., storeName=cf, 
priority=99, startTime=1585233151918; duration=0sec
2020-03-26 14:32:32,091 DEBUG 
[regionserver/c2504-node4:16020-longCompactions-1585218367783] 
regionserver.CompactSplit: Status 
compactionQueue=(longCompactions=0:shortCompactions=0), 
splitQueue=0233150936.bf84f2e23131d9488d9c56117d374187.
2020-03-26 14:32:32,101 DEBUG 
[rs(c2504-node4.coelab.cloudera.com,16020,1585218362034)-snapshot-pool6-thread-1]
 snapshot.FlushSnapshotSubprocedure: ... Flush Snapshotting region 
tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445. completed.
2020-03-26 14:32:32,101 DEBUG 
[rs(c2504-node4.coelab.cloudera.com,16020,1585218362034)-snapshot-pool6-thread-1]
 snapshot.FlushSnapshotSubprocedure: Closing snapshot operation on 
tab2,,1585233150936.dcf89acf08c55f494fd93ceedd3f3445.
2020-03-26 14:32:32,102 DEBUG [member: 
'c2504-node4.coelab.cloudera.com,16020,1585218362034' 
subprocedure-pool2-thread-1] snapshot.RegionServerSnapshotManager: Completed 
1/2 local region snapshots.
2020-03-26 14:32:32,102 DEBUG [member: 
'c2504-node4.coelab.cloudera.com,16020,1585218362034' 
subprocedure-pool2-thread-1] snapshot.RegionServerSnapshotManager: Completed 
2/2 local region snapshots.
{code}
 

 

 

> Snapshoting a splitting region results in corrupted snapshot
> 
>
> Key: HBASE-23995
> URL: https://issues.apache.org/jira/browse/HBASE-23995
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.0.2
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The problem seems to originate from the fact that while the region split 
> itself runs in a lock, the compactions following it run in separate threads. 
> Alternatively the use of space quota policies can prevent compaction after a 
> split and leads to the same issue.
> In both cases the resulting snapshot will keep the split status of the parent 
> region, but do not keep t

[jira] [Comment Edited] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot

2020-03-17 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060757#comment-17060757
 ] 

Szabolcs Bukros edited comment on HBASE-23995 at 3/17/20, 9:19 AM:
---

After the split the daughter regions only have hfile links to the storefile in 
the parent. Even the CatalogJanitor leaves these parents alone, only deleting 
it after the compaction is done and are no longer referenced from daughters.

The title might not be precise or well choosen. What I wanted to say is that 
snapshoting a state where the split was done, but compaction was not (this is 
what I clumsily called "splitting") results in a structure where the daughter 
regions has no data just links to a parent is saved and can be exported. 
However not every necessary info is exported with it (daughter references from 
parent are missing) and this leads to an issue where in the cloned table the 
parent region, that actually contains the data is archived then deleted in 
minutes after the cloning is done, resulting in loosing the exported data.


was (Author: bszabolcs):
After the split the daughter regions only have hfile links to the storefile in 
the parent. Even the CatalogJanitor leaves these parents alone, only deleting 
it after the compaction is done and are no longer referenced from daughters.

The title might not be precise or well choosen. What I wanted to say is that 
snapshoting a state where the split was done, but compaction was not (this is 
what I clumsily called "splitting") results in a structure where the daughter 
regions has no data just links to a parent is saved and can be exported. 
However not every necessary info is exported with it (daughter references from 
parent are missing) and this leads to an issue where in the cloned table the 
parent, that actually contains the data is archived then deleted in minutes 
after the cloning is done, resulting in loosing the exported data.

> Snapshoting a splitting region results in corrupted snapshot
> 
>
> Key: HBASE-23995
> URL: https://issues.apache.org/jira/browse/HBASE-23995
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.0.2
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The problem seems to originate from the fact that while the region split 
> itself runs in a lock, the compactions following it run in separate threads. 
> Alternatively the use of space quota policies can prevent compaction after a 
> split and leads to the same issue.
> In both cases the resulting snapshot will keep the split status of the parent 
> region, but do not keep the references to the daughter regions, because they 
> (splitA, splitB qualifiers) are stored separately in the meta table and do 
> not propagate with the snapshot.
> This is important because the in the freshly cloned table CatalogJanitor will 
> find the parent region, realizes it is in split state, but because it can not 
> find the daughter region references (haven't propagated) assumes parent could 
> be cleaned up and deletes it. The archived region used in the snaphost only 
> has back reference to the now also archived parent region and if the snapshot 
> is deleted they both gets cleaned up. Unfortunately the daughter regions only 
> contains hfile links, so at this point the data is lost.
> How to reproduce:
> {code:java}
> hbase shell < create 'test', 'cf'
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> flush 'test'
> split 'test'
> snapshot 'test', 'testshot'
> EOF
> {code}
> This should make sure the snapshot is made before the compaction could be 
> finished even with small amount of data.
> {code:java}
> sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
> testshot -copy-to hdfs://target:8020/apps/hbase/data/
> {code}
> I export the snapshot to make the usecase cleaner but deleting both the 
> snapshot and the original table after the cloning should have the same effect.
> {code:java}
> clone_snapshot 'testshot', 'test2'
> delete_snapshot "testshot"
> {code}
> I'm not sure what would be the best way to fix this. Preventing snapshots 
> when a region is in split state, would make snapshot creation problematic. 
> Forcing to run compaction as part of the split thread would make it rather 
> slow. Propagating the daughter region references could prevent the deletion 
> of the cloned parent region and the data would not be broken anymore but I'm 
> not sure we have a logic in place that could pick up the pieces and finish 
> the split process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot

2020-03-17 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060757#comment-17060757
 ] 

Szabolcs Bukros commented on HBASE-23995:
-

After the split the daughter regions only have hfile links to the storefile in 
the parent. Even the CatalogJanitor leaves these parents alone, only deleting 
it after the compaction is done and are no longer referenced from daughters.

The title might not be precise or well choosen. What I wanted to say is that 
snapshoting a state where the split was done, but compaction was not (this is 
what I clumsily called "splitting") results in a structure where the daughter 
regions has no data just links to a parent is saved and can be exported. 
However not every necessary info is exported with it (daughter references from 
parent are missing) and this leads to an issue where in the cloned table the 
parent, that actually contains the data is archived then deleted in minutes 
after the cloning is done, resulting in loosing the exported data.

> Snapshoting a splitting region results in corrupted snapshot
> 
>
> Key: HBASE-23995
> URL: https://issues.apache.org/jira/browse/HBASE-23995
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.0.2
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The problem seems to originate from the fact that while the region split 
> itself runs in a lock, the compactions following it run in separate threads. 
> Alternatively the use of space quota policies can prevent compaction after a 
> split and leads to the same issue.
> In both cases the resulting snapshot will keep the split status of the parent 
> region, but do not keep the references to the daughter regions, because they 
> (splitA, splitB qualifiers) are stored separately in the meta table and do 
> not propagate with the snapshot.
> This is important because the in the freshly cloned table CatalogJanitor will 
> find the parent region, realizes it is in split state, but because it can not 
> find the daughter region references (haven't propagated) assumes parent could 
> be cleaned up and deletes it. The archived region used in the snaphost only 
> has back reference to the now also archived parent region and if the snapshot 
> is deleted they both gets cleaned up. Unfortunately the daughter regions only 
> contains hfile links, so at this point the data is lost.
> How to reproduce:
> {code:java}
> hbase shell < create 'test', 'cf'
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> flush 'test'
> split 'test'
> snapshot 'test', 'testshot'
> EOF
> {code}
> This should make sure the snapshot is made before the compaction could be 
> finished even with small amount of data.
> {code:java}
> sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
> testshot -copy-to hdfs://target:8020/apps/hbase/data/
> {code}
> I export the snapshot to make the usecase cleaner but deleting both the 
> snapshot and the original table after the cloning should have the same effect.
> {code:java}
> clone_snapshot 'testshot', 'test2'
> delete_snapshot "testshot"
> {code}
> I'm not sure what would be the best way to fix this. Preventing snapshots 
> when a region is in split state, would make snapshot creation problematic. 
> Forcing to run compaction as part of the split thread would make it rather 
> slow. Propagating the daughter region references could prevent the deletion 
> of the cloned parent region and the data would not be broken anymore but I'm 
> not sure we have a logic in place that could pick up the pieces and finish 
> the split process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot

2020-03-17 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060717#comment-17060717
 ] 

Szabolcs Bukros commented on HBASE-23995:
-

Hi [~zhangduo], thanks for your reply!

I tested and reproduced the issue on 2.0.2 but based on a quick comparison with 
master I would say not much have changed and the issue should be present there 
too.

If I understand correctly Procedure locks do not help because the compaction 
runs in separate threads. SplitTableRegionProcedure does the splitting, creates 
a ThreadPoolExecutor for the compactions and releases the locks while the 
compactions run in the background, making the snapshot possible.

> Snapshoting a splitting region results in corrupted snapshot
> 
>
> Key: HBASE-23995
> URL: https://issues.apache.org/jira/browse/HBASE-23995
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.0.2
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The problem seems to originate from the fact that while the region split 
> itself runs in a lock, the compactions following it run in separate threads. 
> Alternatively the use of space quota policies can prevent compaction after a 
> split and leads to the same issue.
> In both cases the resulting snapshot will keep the split status of the parent 
> region, but do not keep the references to the daughter regions, because they 
> (splitA, splitB qualifiers) are stored separately in the meta table and do 
> not propagate with the snapshot.
> This is important because the in the freshly cloned table CatalogJanitor will 
> find the parent region, realizes it is in split state, but because it can not 
> find the daughter region references (haven't propagated) assumes parent could 
> be cleaned up and deletes it. The archived region used in the snaphost only 
> has back reference to the now also archived parent region and if the snapshot 
> is deleted they both gets cleaned up. Unfortunately the daughter regions only 
> contains hfile links, so at this point the data is lost.
> How to reproduce:
> {code:java}
> hbase shell < create 'test', 'cf'
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> flush 'test'
> split 'test'
> snapshot 'test', 'testshot'
> EOF
> {code}
> This should make sure the snapshot is made before the compaction could be 
> finished even with small amount of data.
> {code:java}
> sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
> testshot -copy-to hdfs://target:8020/apps/hbase/data/
> {code}
> I export the snapshot to make the usecase cleaner but deleting both the 
> snapshot and the original table after the cloning should have the same effect.
> {code:java}
> clone_snapshot 'testshot', 'test2'
> delete_snapshot "testshot"
> {code}
> I'm not sure what would be the best way to fix this. Preventing snapshots 
> when a region is in split state, would make snapshot creation problematic. 
> Forcing to run compaction as part of the split thread would make it rather 
> slow. Propagating the daughter region references could prevent the deletion 
> of the cloned parent region and the data would not be broken anymore but I'm 
> not sure we have a logic in place that could pick up the pieces and finish 
> the split process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot

2020-03-16 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros updated HBASE-23995:

Description: 
The problem seems to originate from the fact that while the region split itself 
runs in a lock, the compactions following it run in separate threads. 
Alternatively the use of space quota policies can prevent compaction after a 
split and leads to the same issue.

In both cases the resulting snapshot will keep the split status of the parent 
region, but do not keep the references to the daughter regions, because they 
(splitA, splitB qualifiers) are stored separately in the meta table and do not 
propagate with the snapshot.

This is important because the in the freshly cloned table CatalogJanitor will 
find the parent region, realizes it is in split state, but because it can not 
find the daughter region references (haven't propagated) assumes parent could 
be cleaned up and deletes it. The archived region used in the snaphost only has 
back reference to the now also archived parent region and if the snapshot is 
deleted they both gets cleaned up. Unfortunately the daughter regions only 
contains hfile links, so at this point the data is lost.

How to reproduce:
{code:java}
hbase shell < Snapshoting a splitting region results in corrupted snapshot
> 
>
> Key: HBASE-23995
> URL: https://issues.apache.org/jira/browse/HBASE-23995
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.0.2
>Reporter: Szabolcs Bukros
>Priority: Major
>
> The problem seems to originate from the fact that while the region split 
> itself runs in a lock, the compactions following it run in separate threads. 
> Alternatively the use of space quota policies can prevent compaction after a 
> split and leads to the same issue.
> In both cases the resulting snapshot will keep the split status of the parent 
> region, but do not keep the references to the daughter regions, because they 
> (splitA, splitB qualifiers) are stored separately in the meta table and do 
> not propagate with the snapshot.
> This is important because the in the freshly cloned table CatalogJanitor will 
> find the parent region, realizes it is in split state, but because it can not 
> find the daughter region references (haven't propagated) assumes parent could 
> be cleaned up and deletes it. The archived region used in the snaphost only 
> has back reference to the now also archived parent region and if the snapshot 
> is deleted they both gets cleaned up. Unfortunately the daughter regions only 
> contains hfile links, so at this point the data is lost.
> How to reproduce:
> {code:java}
> hbase shell < create 'test', 'cf'
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> flush 'test'
> split 'test'
> snapshot 'test', 'testshot'
> EOF
> {code}
> This should make sure the snapshot is made before the compaction could be 
> finished even with small amount of data.
> {code:java}
> sudo -u habse hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
> testshot -copy-to hdfs://target:8020/apps/hbase/data/
> {code}
> I export the snapshot to make the usecase cleaner but deleting both the 
> snapshot and the original table after the cloning should have the same effect.
> {code:java}
> clone_snapshot 'testshot', 'test2'
> delete_snapshot "testshot"
> {code}
> I'm not sure what would be the best way to fix this. Preventing snapshots 
> when a region is in split state, would make snapshot creation problematic. 
> Forcing to run compaction as part of the split thread would make it rather 
> slow. Propagating the daughter region references could prevent the deletion 
> of the cloned parent region and the data would not be broken anymore but I'm 
> not sure we have a logic in place that could pick up the pieces and finish 
> the split process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23995) Snapshoting a splitting region results in corrupted snapshot

2020-03-16 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-23995:
---

 Summary: Snapshoting a splitting region results in corrupted 
snapshot
 Key: HBASE-23995
 URL: https://issues.apache.org/jira/browse/HBASE-23995
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.0.2
Reporter: Szabolcs Bukros


The problem seems to originate from the fact that while the region split itself 
runs in a lock, the compactions following it run in separate threads. 
Alternatively the use of space quota policies can prevent compaction after a 
split and leads to the same issue.

In both cases the resulting snapshot will keep the split status of the parent 
region, but do not keep the references to the daughter regions, because they 
(splitA, splitB qualifiers) are stored separately in the meta table and do not 
propagate with the snapshot.

This is important because the in the freshly cloned table CatalogJanitor will 
find the parent region, realizes it is in split state, but because it can not 
find the daughter region references (haven't propagated) assumes parent could 
be cleaned up and deletes it. The archived region used in the snaphost only has 
back reference to the now also archived parent region and if the snapshot is 
deleted they both gets cleaned up. Unfortunately the daughter regions only 
contains hfile links, so at this point the data is lost.

How to reproduce:
{code:java}
hbase shell <

[jira] [Assigned] (HBASE-23891) Add an option to Actions to filter out meta RS

2020-02-27 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros reassigned HBASE-23891:
---

Assignee: Szabolcs Bukros

> Add an option to Actions to filter out meta RS
> --
>
> Key: HBASE-23891
> URL: https://issues.apache.org/jira/browse/HBASE-23891
> Project: HBase
>  Issue Type: Sub-task
>  Components: integration tests
>Affects Versions: 3.0.0
>Reporter: Tamas Adami
>Assignee: Szabolcs Bukros
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> Add an option to Actions to be able to filter meta server out. 
> Some ITs rely on meta RS and have timeout errors if this RS is killed. (e.g. 
> IntegrationTestTimeBoundedRequestsWithRegionReplicas)
> For the time being there is no option for removing meta server from server 
> list to kill or configuring these actions properly.
> The following chaos monkey actions are affected: 
> GracefulRollingRestartRsAction, RollingBatchSuspendResumeRsAction 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18326) Fix and reenable TestMasterProcedureWalLease

2020-01-16 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016955#comment-17016955
 ] 

Szabolcs Bukros commented on HBASE-18326:
-

Test got deleted in HBASE-23326. Can we close this ticket or should we try to 
re-introduce the test?

> Fix and reenable TestMasterProcedureWalLease
> 
>
> Key: HBASE-18326
> URL: https://issues.apache.org/jira/browse/HBASE-18326
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Michael Stack
>Priority: Blocker
> Fix For: 3.0.0, 2.3.0
>
>
> Fix and reenable flakey important test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23601) OutputSink.WriterThread exception gets stuck and repeated indefinietly

2020-01-13 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014307#comment-17014307
 ] 

Szabolcs Bukros commented on HBASE-23601:
-

New PR is created for branch-2. #1028

> OutputSink.WriterThread exception gets stuck and repeated indefinietly
> --
>
> Key: HBASE-23601
> URL: https://issues.apache.org/jira/browse/HBASE-23601
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 2.2.2
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.9, 2.2.4
>
>
> When a WriterThread runs into an exception (ie: NotServingRegionException), 
> the exception is stored in the controller. It is never removed and can not be 
> overwritten either.
>  
> {code:java}
> public void run()  {
>   try {
> doRun();
>   } catch (Throwable t) {
> LOG.error("Exiting thread", t);
> controller.writerThreadError(t);
>   }
> }{code}
> Thanks to this every time PipelineController.checkForErrors() is called the 
> same old exception is rethrown.
>  
> For example in RegionReplicaReplicationEndpoint.replicate there is a while 
> loop that does the actual replicating. Every time it loops, it calls 
> checkForErrors(), catches the rethrown exception, logs it but does nothing 
> about it. This results in ~2GB log files in ~5min in my experience.
>  
> My proposal would be to clean up the stored exception when it reaches 
> RegionReplicaReplicationEndpoint.replicate and make sure we restart the 
> WriterThread that died throwing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23591) Negative memStoreSizing

2020-01-10 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012853#comment-17012853
 ] 

Szabolcs Bukros commented on HBASE-23591:
-

[~anoop.hbase] You are right,  we would loose the file ref... Thanks for 
pointing it out!

What do you think about a solution where we send the last x (maybe 10?) 
HStoreFile paths in the FlushDescriptor instead of just the latest one? In the 
StoreFileManager the files are ordered by seqID so grabbing the last few is 
easy. We would have to make sure to filter out storedFiles already listed in 
the store of the replica region in replayFlush too.  This way the first 
successful flush would add all the potentially missing refs.

> Negative memStoreSizing
> ---
>
> Key: HBASE-23591
> URL: https://issues.apache.org/jira/browse/HBASE-23591
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Reporter: Szabolcs Bukros
>Priority: Major
> Fix For: 2.2.2
>
>
> After a flush on the replica region the memStoreSizing becomes negative:
> {code:java}
> 2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: 
> COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplicati
> on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" 
> flush_sequence_number: 41392 store_flushes { family_name: "f1" 
> store_home_dir: "f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } 
> store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: 
> "9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" 
> store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } 
> region_name: 
> "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee."
> 2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> 0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with 
> seqId:41392 and a previous prepared snapshot was found
> 2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2,
>  entries=32445, sequenceid=41392, filesize=27.6 M
> 2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e,
>  entries=12264, sequenceid=41392, filesize=10.9 M
> 2019-12-17 08:32:00,121 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f3/e6f25e6b0eca4d22af15d0626d0f8759,
>  entries=32379, sequenceid=41392, filesize=27.5 M
> 2019-12-17 08:32:00,122 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> CustomLog decrMemStoreSize. Current: dataSize=135810071, 
> getHeapSize=148400960, getOffHeapSize=0, getCellsCount=167243 delta: 
> dataSizeDelta=155923644, heapSizeDelta=170112320, offHeapSizeDelta=0, 
> cellsCountDelta=188399
> 2019-12-17 08:32:00,122 ERROR org.apache.hadoop.hbase.regionserver.HRegion: 
> Asked to modify this region's 
> (IntegrationTestRegionReplicaReplication,,1576599911697_0001.0beaae111b0f6e98bfde31ba35be54
> 08.) memStoreSizing to a negative value which is incorrect. Current 
> memStoreSizing=135810071, delta=-155923644
> java.lang.Exception
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1323)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1316)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1303)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5194)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5025)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2232)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at 
> org.apache.hadoop.hb

[jira] [Commented] (HBASE-23601) OutputSink.WriterThread exception gets stuck and repeated indefinietly

2020-01-10 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012557#comment-17012557
 ] 

Szabolcs Bukros commented on HBASE-23601:
-

[~stack] Thanks for the merge! This doesn't need a master patch. The code there 
was heavily rewritten and as far as I can tell it's not affected by this issue. 
I'll check why it fails on branch-2.

> OutputSink.WriterThread exception gets stuck and repeated indefinietly
> --
>
> Key: HBASE-23601
> URL: https://issues.apache.org/jira/browse/HBASE-23601
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 2.2.2
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.9, 2.2.4
>
>
> When a WriterThread runs into an exception (ie: NotServingRegionException), 
> the exception is stored in the controller. It is never removed and can not be 
> overwritten either.
>  
> {code:java}
> public void run()  {
>   try {
> doRun();
>   } catch (Throwable t) {
> LOG.error("Exiting thread", t);
> controller.writerThreadError(t);
>   }
> }{code}
> Thanks to this every time PipelineController.checkForErrors() is called the 
> same old exception is rethrown.
>  
> For example in RegionReplicaReplicationEndpoint.replicate there is a while 
> loop that does the actual replicating. Every time it loops, it calls 
> checkForErrors(), catches the rethrown exception, logs it but does nothing 
> about it. This results in ~2GB log files in ~5min in my experience.
>  
> My proposal would be to clean up the stored exception when it reaches 
> RegionReplicaReplicationEndpoint.replicate and make sure we restart the 
> WriterThread that died throwing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23591) Negative memStoreSizing

2020-01-09 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011763#comment-17011763
 ] 

Szabolcs Bukros commented on HBASE-23591:
-

[~anoop.hbase] Basically yes. HBASE-23589 was the root cause and I haven't seen 
this problem after the fix. But cleaner code would be nice and would prevent 
potential future issues.

" Then when the subsequent flushes happen on the same CFs and replay WAL marker 
reaches the replica regions how that will get handled? "

It won't get handled. We use the old snapshot for the subsequent flush. As far 
as I can tell this would cause no problems, it would only mean the Memstore 
won't be empty after the flush and would require another flush sooner.

> Negative memStoreSizing
> ---
>
> Key: HBASE-23591
> URL: https://issues.apache.org/jira/browse/HBASE-23591
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Reporter: Szabolcs Bukros
>Priority: Major
> Fix For: 2.2.2
>
>
> After a flush on the replica region the memStoreSizing becomes negative:
> {code:java}
> 2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: 
> COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplicati
> on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" 
> flush_sequence_number: 41392 store_flushes { family_name: "f1" 
> store_home_dir: "f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } 
> store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: 
> "9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" 
> store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } 
> region_name: 
> "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee."
> 2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> 0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with 
> seqId:41392 and a previous prepared snapshot was found
> 2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2,
>  entries=32445, sequenceid=41392, filesize=27.6 M
> 2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e,
>  entries=12264, sequenceid=41392, filesize=10.9 M
> 2019-12-17 08:32:00,121 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f3/e6f25e6b0eca4d22af15d0626d0f8759,
>  entries=32379, sequenceid=41392, filesize=27.5 M
> 2019-12-17 08:32:00,122 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> CustomLog decrMemStoreSize. Current: dataSize=135810071, 
> getHeapSize=148400960, getOffHeapSize=0, getCellsCount=167243 delta: 
> dataSizeDelta=155923644, heapSizeDelta=170112320, offHeapSizeDelta=0, 
> cellsCountDelta=188399
> 2019-12-17 08:32:00,122 ERROR org.apache.hadoop.hbase.regionserver.HRegion: 
> Asked to modify this region's 
> (IntegrationTestRegionReplicaReplication,,1576599911697_0001.0beaae111b0f6e98bfde31ba35be54
> 08.) memStoreSizing to a negative value which is incorrect. Current 
> memStoreSizing=135810071, delta=-155923644
> java.lang.Exception
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1323)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1316)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1303)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5194)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5025)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2232)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at 
> or

[jira] [Commented] (HBASE-23591) Negative memStoreSizing

2020-01-08 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010807#comment-17010807
 ] 

Szabolcs Bukros commented on HBASE-23591:
-

[~anoop.hbase], [~zhangduo] ,[~stack] 

I did some further investigation and would like to hear your thoughts. I used 
the otherwise fixed issue (HBASE-23589) to break replication and make maxing 
the memstore out easier. But if the commit marker would be missing or some 
other issue would have prevented the flush from finishing, the result would be 
the same.

 

The issue:

We try to replay a flush for column families f1 and f3. The FlushDescriptor is 
incorrect and both families contain the wrong committed file list. This means 
we would get an exception in replayFlush when we try to call getStoreFileInfo. 
(This is fixed in HBASE-23589) This exception is only caught at the very end of 
replayWALFlushCommitMarker and there are no steps to handle it or clean up. So 
writestate.flushing remains true and prepareFlushResult still has a value. 

Next time we try to replay a flush for f2. replayWALFlushStartMarker does 
nothing because prepareFlushResult writestate.flushing is true. 
replayFlushInStores does also nothing because the prepareFlushResult exists but 
contains no context for f2. Unfortunately replayWALFlushCommitMarker isn't 
aware that no flush was made and at the end still calls decrMemStoreSize with 
data that has nothing to do with the current column family.

 

The current solution doesn't handle the exception well and ignores the fact 
that we would need different prepared data for different cfs.

My first proposal would be to drastically simplify replayWALFlushCommitMarker.  
Something like this:
{code:java}
if (prepareFlushResult != null && flush.getFlushSequenceNumber() == 
prepareFlushResult.flushOpSeqId) {
  try {
replayFlushInStores(flush, prepareFlushResult, true);

this.decrMemStoreSize(prepareFlushResult.totalFlushableSize.getMemStoreSize());
  }
  catch (Exception ex){
//log exception
throw ex; //maybe only re-throw if not FileNotFoundException
  }
  finally {
this.prepareFlushResult = null;
writestate.flushing = false;
  }
} else{
   // ... log ...
  this.prepareFlushResult = null;
  writestate.flushing = false;
}
{code}
So unless we find the correct prepared data we clean up prepareFlushResult and 
skip the flush. Same if we see an exception. On the upside this would result in 
a more stable replica. On the downside it would also mean less successful 
flushes, so more memory usage and more flush attempts. I do not see any 
negative consequence besides the performance loss.

It could be improved with checking if the prepareFlushResult uses the same cfs 
as the flushDescriptor and still doing the flush if seqId is newer than what we 
expected. But I'm not sure it's necessary. What do you think?

> Negative memStoreSizing
> ---
>
> Key: HBASE-23591
> URL: https://issues.apache.org/jira/browse/HBASE-23591
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Reporter: Szabolcs Bukros
>Priority: Major
> Fix For: 2.2.2
>
>
> After a flush on the replica region the memStoreSizing becomes negative:
> {code:java}
> 2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: 
> COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplicati
> on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" 
> flush_sequence_number: 41392 store_flushes { family_name: "f1" 
> store_home_dir: "f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } 
> store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: 
> "9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" 
> store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } 
> region_name: 
> "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee."
> 2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> 0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with 
> seqId:41392 and a previous prepared snapshot was found
> 2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2,
>  entries=32445, sequenceid=41392, filesize=27.6 M
> 2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e,
>  entries=12264, sequ

[jira] [Commented] (HBASE-23589) FlushDescriptor contains non-matching family/output combinations

2020-01-02 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006678#comment-17006678
 ] 

Szabolcs Bukros commented on HBASE-23589:
-

[~binlijin] Thanks for the merge!

> FlushDescriptor contains non-matching family/output combinations
> 
>
> Key: HBASE-23589
> URL: https://issues.apache.org/jira/browse/HBASE-23589
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 2.2.2, 2.1.8
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Critical
> Fix For: 3.0.0, 2.3.0, 2.2.3, 2.1.9
>
>
> Flushing the active region creates the following files:
> {code:java}
> 2019-12-13 08:00:20,866 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/dab4d1cc01e44773bad7bdb5d2e33b6c,
>  entries=49128, sequenceid
> =70688, filesize=41.4 M
> 2019-12-13 08:00:20,897 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f3/ecc50f33085042f7bd2397253b896a3a,
>  entries=5, sequenceid
> =70688, filesize=42.3 M
> {code}
> On the read replica region when we try to replay the flush we see the 
> following:
> {code:java}
> 2019-12-13 08:00:21,279 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> bfa9cdb0ab13d60b389df6621ab316d1 : At least one of the store files in flush: 
> action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplication" 
> encoded_region_name: "20af2eb8929408f26d0b3b81e6b86d47" 
> flush_sequence_number: 70688 store_flushes { family_name: "f2" 
> store_home_dir: "f2" flush_output: "ecc50f33085042f7bd2397253b896a3a" } 
> store_flushes { family_name: "f3" store_home_dir: "f3" flush_output: 
> "dab4d1cc01e44773bad7bdb5d2e33b6c" } region_name: 
> "IntegrationTestRegionReplicaReplication,,1576252065847.20af2eb8929408f26d0b3b81e6b86d47."
>  doesn't exist any more. Skip loading the file(s)
> java.io.FileNotFoundException: HFileLink 
> locations=[hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
>  
> hdfs://replica-1:8020/hbase/.tmp/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
>  
> hdfs://replica-1:8020/hbase/mobdir/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
>  
> hdfs://replica-1:8020/hbase/archive/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a]
> at 
> org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:415)
> at 
> org.apache.hadoop.hbase.util.ServerRegionReplicaUtil.getStoreFileInfo(ServerRegionReplicaUtil.java:135)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFileInfo(HRegionFileSystem.java:311)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.replayFlush(HStore.java:2414)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayFlushInStores(HRegion.java:5310)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5184)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5018)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2229)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {code}
> As we can see the flush_outputs got mixed up. 
>  
> The issue is caused by HRegion.internalFlushCacheAndCommit. The code assumes 
> "{color:#808080}stores.values() and storeFlushCtxs have same order{color}" 
> which no longer seems to be true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23601) OutputSink.WriterThread exception gets stuck and repeated indefinietly

2019-12-20 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros updated HBASE-23601:

Affects Version/s: 2.2.2

> OutputSink.WriterThread exception gets stuck and repeated indefinietly
> --
>
> Key: HBASE-23601
> URL: https://issues.apache.org/jira/browse/HBASE-23601
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 2.2.2
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.2.3
>
>
> When a WriterThread runs into an exception (ie: NotServingRegionException), 
> the exception is stored in the controller. It is never removed and can not be 
> overwritten either.
>  
> {code:java}
> public void run()  {
>   try {
> doRun();
>   } catch (Throwable t) {
> LOG.error("Exiting thread", t);
> controller.writerThreadError(t);
>   }
> }{code}
> Thanks to this every time PipelineController.checkForErrors() is called the 
> same old exception is rethrown.
>  
> For example in RegionReplicaReplicationEndpoint.replicate there is a while 
> loop that does the actual replicating. Every time it loops, it calls 
> checkForErrors(), catches the rethrown exception, logs it but does nothing 
> about it. This results in ~2GB log files in ~5min in my experience.
>  
> My proposal would be to clean up the stored exception when it reaches 
> RegionReplicaReplicationEndpoint.replicate and make sure we restart the 
> WriterThread that died throwing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23601) OutputSink.WriterThread exception gets stuck and repeated indefinietly

2019-12-20 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-23601:
---

 Summary: OutputSink.WriterThread exception gets stuck and repeated 
indefinietly
 Key: HBASE-23601
 URL: https://issues.apache.org/jira/browse/HBASE-23601
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros
 Fix For: 2.2.2


When a WriterThread runs into an exception (ie: NotServingRegionException), the 
exception is stored in the controller. It is never removed and can not be 
overwritten either.

 
{code:java}
public void run()  {
  try {
doRun();
  } catch (Throwable t) {
LOG.error("Exiting thread", t);
controller.writerThreadError(t);
  }
}{code}
Thanks to this every time PipelineController.checkForErrors() is called the 
same old exception is rethrown.

 

For example in RegionReplicaReplicationEndpoint.replicate there is a while loop 
that does the actual replicating. Every time it loops, it calls 
checkForErrors(), catches the rethrown exception, logs it but does nothing 
about it. This results in ~2GB log files in ~5min in my experience.

 

My proposal would be to clean up the stored exception when it reaches 
RegionReplicaReplicationEndpoint.replicate and make sure we restart the 
WriterThread that died throwing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23591) Negative memStoreSizing

2019-12-18 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros updated HBASE-23591:

Description: 
After a flush on the replica region the memStoreSizing becomes negative:
{code:java}
2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: COMMIT_FLUSH 
table_name: "IntegrationTestRegionReplicaReplicati
on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" 
flush_sequence_number: 41392 store_flushes { family_name: "f1" store_home_dir: 
"f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } store_flushes { 
family_name: "f2" store_home_dir: "f2" flush_output: 
"9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" 
store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } 
region_name: 
"IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee."
2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with 
seqId:41392 and a previous prepared snapshot was found
2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: 
Region: 0beaae111b0f6e98bfde31ba35be5408 added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2,
 entries=32445, sequenceid=41392, filesize=27.6 M
2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: 
Region: 0beaae111b0f6e98bfde31ba35be5408 added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e,
 entries=12264, sequenceid=41392, filesize=10.9 M
2019-12-17 08:32:00,121 INFO org.apache.hadoop.hbase.regionserver.HStore: 
Region: 0beaae111b0f6e98bfde31ba35be5408 added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f3/e6f25e6b0eca4d22af15d0626d0f8759,
 entries=32379, sequenceid=41392, filesize=27.5 M
2019-12-17 08:32:00,122 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
CustomLog decrMemStoreSize. Current: dataSize=135810071, getHeapSize=148400960, 
getOffHeapSize=0, getCellsCount=167243 delta: dataSizeDelta=155923644, 
heapSizeDelta=170112320, offHeapSizeDelta=0, cellsCountDelta=188399
2019-12-17 08:32:00,122 ERROR org.apache.hadoop.hbase.regionserver.HRegion: 
Asked to modify this region's 
(IntegrationTestRegionReplicaReplication,,1576599911697_0001.0beaae111b0f6e98bfde31ba35be54
08.) memStoreSizing to a negative value which is incorrect. Current 
memStoreSizing=135810071, delta=-155923644
java.lang.Exception
at 
org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1323)
at 
org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1316)
at 
org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1303)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5194)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5025)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2232)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

{code}
I added some custom logging to the snapshot logic to be able to see snapshot 
sizes: 
{code:java}
2019-12-17 08:31:56,900 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: START_FLUSH 
table_name: "IntegrationTestRegionReplicaReplication" encoded_region_name: 
"544affde3e027454f67c8ea46c8f69ee" flush_sequence_number: 41392 store_flushes { 
family_name: "f1" store_home_dir: "f1" } store_flushes { family_name: "f2" 
store_home_dir: "f2" } store_flushes { family_name: "f3" store_home_dir: "f3" } 
region_name: 
"IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee."
2019-12-17 08:31:56,900 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Flushing 0beaae111b0f6e98bfde31ba35be5408 3/3 column families, dataSize=126.49 
MB heapSize=138.24 MB
2019-12-17 08:31:56,900 WARN 
org.apache.hadoop.hbase.regionserver.DefaultMemStore: Snapshot called again 
without clearing previous.

[jira] [Created] (HBASE-23591) Negative memStoreSizing

2019-12-18 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-23591:
---

 Summary: Negative memStoreSizing
 Key: HBASE-23591
 URL: https://issues.apache.org/jira/browse/HBASE-23591
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Reporter: Szabolcs Bukros
 Fix For: 2.2.2


After a flush on the replica region the memStoreSizing becomes negative:

 
{code:java}
2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: COMMIT_FLUSH 
table_name: "IntegrationTestRegionReplicaReplicati
on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" 
flush_sequence_number: 41392 store_flushes { family_name: "f1" store_home_dir: 
"f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } store_flushes { 
family_name: "f2" store_home_dir: "f2" flush_output: 
"9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" 
store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } 
region_name: 
"IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee."
2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with 
seqId:41392 and a previous prepared snapshot was found
2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: 
Region: 0beaae111b0f6e98bfde31ba35be5408 added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2,
 entries=32445, sequenceid=41392, filesize=27.6 M
2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: 
Region: 0beaae111b0f6e98bfde31ba35be5408 added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e,
 entries=12264, sequenceid=41392, filesize=10.9 M
2019-12-17 08:32:00,121 INFO org.apache.hadoop.hbase.regionserver.HStore: 
Region: 0beaae111b0f6e98bfde31ba35be5408 added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f3/e6f25e6b0eca4d22af15d0626d0f8759,
 entries=32379, sequenceid=41392, filesize=27.5 M
2019-12-17 08:32:00,122 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
CustomLog decrMemStoreSize. Current: dataSize=135810071, getHeapSize=148400960, 
getOffHeapSize=0, getCellsCount=167243 delta: dataSizeDelta=155923644, 
heapSizeDelta=170112320, offHeapSizeDelta=0, cellsCountDelta=188399
2019-12-17 08:32:00,122 ERROR org.apache.hadoop.hbase.regionserver.HRegion: 
Asked to modify this region's 
(IntegrationTestRegionReplicaReplication,,1576599911697_0001.0beaae111b0f6e98bfde31ba35be54
08.) memStoreSizing to a negative value which is incorrect. Current 
memStoreSizing=135810071, delta=-155923644
java.lang.Exception
at 
org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1323)
at 
org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1316)
at 
org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1303)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5194)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5025)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2232)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

{code}
 

 

I added some custom logging to the snapshot logic to be able to see snapshot 
sizes:

 
{code:java}
2019-12-17 08:31:56,900 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: START_FLUSH 
table_name: "IntegrationTestRegionReplicaReplication" encoded_region_name: 
"544affde3e027454f67c8ea46c8f69ee" flush_sequence_number: 41392 store_flushes { 
family_name: "f1" store_home_dir: "f1" } store_flushes { family_name: "f2" 
store_home_dir: "f2" } store_flushes { family_name: "f3" store_home_dir: "f3" } 
region_name: 
"IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee."
2019-12-17 08:31:56,900 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Flushing 0beaae111b0f6e98bfde31ba35be5408 3/3 column famil

[jira] [Updated] (HBASE-23589) FlushDescriptor contains non-matching family/output combinations

2019-12-18 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros updated HBASE-23589:

Description: 
Flushing the active region creates the following files:
{code:java}
2019-12-13 08:00:20,866 INFO org.apache.hadoop.hbase.regionserver.HStore: Added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/dab4d1cc01e44773bad7bdb5d2e33b6c,
 entries=49128, sequenceid
=70688, filesize=41.4 M
2019-12-13 08:00:20,897 INFO org.apache.hadoop.hbase.regionserver.HStore: Added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f3/ecc50f33085042f7bd2397253b896a3a,
 entries=5, sequenceid
=70688, filesize=42.3 M
{code}
On the read replica region when we try to replay the flush we see the following:
{code:java}
2019-12-13 08:00:21,279 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
bfa9cdb0ab13d60b389df6621ab316d1 : At least one of the store files in flush: 
action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplication" 
encoded_region_name: "20af2eb8929408f26d0b3b81e6b86d47" flush_sequence_number: 
70688 store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: 
"ecc50f33085042f7bd2397253b896a3a" } store_flushes { family_name: "f3" 
store_home_dir: "f3" flush_output: "dab4d1cc01e44773bad7bdb5d2e33b6c" } 
region_name: 
"IntegrationTestRegionReplicaReplication,,1576252065847.20af2eb8929408f26d0b3b81e6b86d47."
 doesn't exist any more. Skip loading the file(s)
java.io.FileNotFoundException: HFileLink 
locations=[hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
 
hdfs://replica-1:8020/hbase/.tmp/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
 
hdfs://replica-1:8020/hbase/mobdir/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
 
hdfs://replica-1:8020/hbase/archive/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a]
at org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:415)
at 
org.apache.hadoop.hbase.util.ServerRegionReplicaUtil.getStoreFileInfo(ServerRegionReplicaUtil.java:135)
at 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFileInfo(HRegionFileSystem.java:311)
at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.replayFlush(HStore.java:2414)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayFlushInStores(HRegion.java:5310)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5184)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5018)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2229)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
{code}
As we can see the flush_outputs got mixed up. 

 

The issue is caused by HRegion.internalFlushCacheAndCommit. The code assumes 
"{color:#808080}stores.values() and storeFlushCtxs have same order{color}" 
which no longer seems to be true.

  was:
Flushing the active region creates the following files:
{code:java}
2019-12-13 08:00:20,866 INFO org.apache.hadoop.hbase.regionserver.HStore: Added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/dab4d1cc01e44773bad7bdb5d2e33b6c,
 entries=49128, sequenceid
=70688, filesize=41.4 M
2019-12-13 08:00:20,897 INFO org.apache.hadoop.hbase.regionserver.HStore: Added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f3/ecc50f33085042f7bd2397253b896a3a,
 entries=5, sequenceid
=70688, filesize=42.3 M
{code}
On the read replica region when we try to replay the flush we see the following:
{code:java}
2019-12-13 08:00:21,279 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
bfa9cdb0ab13d60b389df6621ab316d1 : At least one of the store files in flush: 
action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplication" 
encoded_region_name: "20af2eb8929408f26d0b3b81e6b86d47" flush_sequence_number: 
70688 sto

[jira] [Created] (HBASE-23589) FlushDescriptor contains non-matching family/output combinations

2019-12-18 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-23589:
---

 Summary: FlushDescriptor contains non-matching family/output 
combinations
 Key: HBASE-23589
 URL: https://issues.apache.org/jira/browse/HBASE-23589
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Affects Versions: 2.2.2
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


Flushing the active region creates the following files:
{code:java}
2019-12-13 08:00:20,866 INFO org.apache.hadoop.hbase.regionserver.HStore: Added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/dab4d1cc01e44773bad7bdb5d2e33b6c,
 entries=49128, sequenceid
=70688, filesize=41.4 M
2019-12-13 08:00:20,897 INFO org.apache.hadoop.hbase.regionserver.HStore: Added 
hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f3/ecc50f33085042f7bd2397253b896a3a,
 entries=5, sequenceid
=70688, filesize=42.3 M
{code}
On the read replica region when we try to replay the flush we see the following:
{code:java}
2019-12-13 08:00:21,279 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
bfa9cdb0ab13d60b389df6621ab316d1 : At least one of the store files in flush: 
action: COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplication" 
encoded_region_name: "20af2eb8929408f26d0b3b81e6b86d47" flush_sequence_number: 
70688 store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: 
"ecc50f33085042f7bd2397253b896a3a" } store_flushes { family_name: "f3" 
store_home_dir: "f3" flush_output: "dab4d1cc01e44773bad7bdb5d2e33b6c" } 
region_name: 
"IntegrationTestRegionReplicaReplication,,1576252065847.20af2eb8929408f26d0b3b81e6b86d47."
 doesn't exist any more. Skip loading the file(s)
java.io.FileNotFoundException: HFileLink 
locations=[hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
 
hdfs://replica-1:8020/hbase/.tmp/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
 
hdfs://replica-1:8020/hbase/mobdir/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a,
 
hdfs://replica-1:8020/hbase/archive/data/default/IntegrationTestRegionReplicaReplication/20af2eb8929408f26d0b3b81e6b86d47/f2/ecc50f33085042f7bd2397253b896a3a]
at org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:415)
at 
org.apache.hadoop.hbase.util.ServerRegionReplicaUtil.getStoreFileInfo(ServerRegionReplicaUtil.java:135)
at 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFileInfo(HRegionFileSystem.java:311)
at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.replayFlush(HStore.java:2414)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayFlushInStores(HRegion.java:5310)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5184)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5018)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2229)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
{code}
As you can see the flush_outputs are mixed up.

 

The issue is caused by HRegion.internalFlushCacheAndCommit. The code assumes 
"{color:#808080}stores.values() and storeFlushCtxs have same order{color}" 
which no longer seems to be true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23566) Fix package/packet terminology problem in chaos monkeys

2019-12-12 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-23566:
---

 Summary: Fix package/packet terminology problem in chaos monkeys
 Key: HBASE-23566
 URL: https://issues.apache.org/jira/browse/HBASE-23566
 Project: HBase
  Issue Type: Improvement
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


There is a terminology problem in some of the network issue related chaos 
monkey actions. The universally understood technical term for network packet is 
packet, not "package".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23085) Network and Data related Actions

2019-12-10 Thread Szabolcs Bukros (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992721#comment-16992721
 ] 

Szabolcs Bukros commented on HBASE-23085:
-

[~apurtell] you are absolutely right, thanks for noticing. I should be able to 
create a PR later this week if that's fine for you.

> Network and Data related Actions
> 
>
> Key: HBASE-23085
> URL: https://issues.apache.org/jira/browse/HBASE-23085
> Project: HBase
>  Issue Type: Sub-task
>  Components: integration tests
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> Add additional actions to:
>  * manipulate network packages with tc (reorder, loose,...)
>  * add CPU load
>  * fill the disk
>  * corrupt or delete regionserver data files
> Create new monkey factories for the new actions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23352) Allow chaos monkeys to access cmd line params, and improve FillDiskCommandAction

2019-11-29 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-23352:
---

 Summary: Allow chaos monkeys to access cmd line params, and 
improve FillDiskCommandAction
 Key: HBASE-23352
 URL: https://issues.apache.org/jira/browse/HBASE-23352
 Project: HBase
  Issue Type: Improvement
  Components: integration tests
Affects Versions: 2.2.2
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


When integration tests are run through hbase cli the properties passed as cmd 
line params does not reach the chaos monkies. It is possible to define a 
property file, but it would be more flexible if we could also pick up 
properties from the command line.

Also I would like to improve FillDiskCommandAction, to stop the remote process 
if the call times out before it could have finished or was run without a size 
parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-23085) Network and Data related Actions

2019-11-13 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros reopened HBASE-23085:
-

Backport commit to branch-2 and branch-2.2

> Network and Data related Actions
> 
>
> Key: HBASE-23085
> URL: https://issues.apache.org/jira/browse/HBASE-23085
> Project: HBase
>  Issue Type: Sub-task
>  Components: integration tests
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Minor
> Fix For: 3.0.0
>
>
> Add additional actions to:
>  * manipulate network packages with tc (reorder, loose,...)
>  * add CPU load
>  * fill the disk
>  * corrupt or delete regionserver data files
> Create new monkey factories for the new actions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23085) Network and Data related Actions

2019-09-27 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-23085:
---

 Summary: Network and Data related Actions
 Key: HBASE-23085
 URL: https://issues.apache.org/jira/browse/HBASE-23085
 Project: HBase
  Issue Type: Sub-task
  Components: integration tests
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


Add additional actions to:
 * manipulate network packages with tc (reorder, loose,...)
 * add CPU load
 * fill the disk
 * corrupt or delete regionserver data files

Create new monkey factories for the new actions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-22982) Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart

2019-09-26 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros reassigned HBASE-22982:
---

Assignee: Szabolcs Bukros

> Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart
> -
>
> Key: HBASE-22982
> URL: https://issues.apache.org/jira/browse/HBASE-22982
> Project: HBase
>  Issue Type: Sub-task
>  Components: integration tests
>Affects Versions: 3.0.0
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Minor
>
> * Add a Chaos Monkey action that uses SIGSTOP and SIGCONT to hang and resume 
> a ratio of region servers.
>  * Add a Chaos Monkey action to simulate a rolling restart including 
> graceful_stop like functionality that unloads the regions from the server 
> before a restart and then places it under load again afterwards.
>  * Add these actions to the relevant monkeys



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-22982) Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart

2019-09-06 Thread Szabolcs Bukros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Bukros updated HBASE-22982:

Description: 
* Add a Chaos Monkey action that uses SIGSTOP and SIGCONT to hang and resume a 
ratio of region servers.
 * Add a Chaos Monkey action to simulate a rolling restart including 
graceful_stop like functionality that unloads the regions from the server 
before a restart and then places it under load again afterwards.
 * Add these actions to the relevant monkeys

  was:
* Add a Chaos Monkey action that uses SIGSTOP and SIGCONT to hang and resume a 
ratio of region servers.
 * Add a Chaos Monkey action to simulate a rolling restart including 
graceful_stop like functionality that unloads the regions from the server 
before a restart and then places it under load again afterwards.


> Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart
> -
>
> Key: HBASE-22982
> URL: https://issues.apache.org/jira/browse/HBASE-22982
> Project: HBase
>  Issue Type: Sub-task
>  Components: integration tests
>Affects Versions: 3.0.0
>Reporter: Szabolcs Bukros
>Priority: Minor
>
> * Add a Chaos Monkey action that uses SIGSTOP and SIGCONT to hang and resume 
> a ratio of region servers.
>  * Add a Chaos Monkey action to simulate a rolling restart including 
> graceful_stop like functionality that unloads the regions from the server 
> before a restart and then places it under load again afterwards.
>  * Add these actions to the relevant monkeys



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HBASE-22982) Send SIGSTOP to hang or SIGCONT to resume rs and add graceful rolling restart

2019-09-06 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-22982:
---

 Summary: Send SIGSTOP to hang or SIGCONT to resume rs and add 
graceful rolling restart
 Key: HBASE-22982
 URL: https://issues.apache.org/jira/browse/HBASE-22982
 Project: HBase
  Issue Type: Sub-task
  Components: integration tests
Affects Versions: 3.0.0
Reporter: Szabolcs Bukros


* Add a Chaos Monkey action that uses SIGSTOP and SIGCONT to hang and resume a 
ratio of region servers.
 * Add a Chaos Monkey action to simulate a rolling restart including 
graceful_stop like functionality that unloads the regions from the server 
before a restart and then places it under load again afterwards.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)