[jira] [Commented] (HBASE-24189) Regionserver recreates region folders in HDFS after replaying WAL with removed table entries
[ https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112045#comment-17112045 ] Anoop Sam John commented on HBASE-24189: Planning to create a patch based on the above solution (Not create region dir if it does not exists) > Regionserver recreates region folders in HDFS after replaying WAL with > removed table entries > > > Key: HBASE-24189 > URL: https://issues.apache.org/jira/browse/HBASE-24189 > Project: HBase > Issue Type: Bug > Components: regionserver, wal >Affects Versions: 2.2.4 > Environment: * HDFS 3.1.3 > * HBase 2.1.4 > * OpenJDK 8 >Reporter: Andrey Elenskiy >Assignee: Anoop Sam John >Priority: Major > > Under the following scenario region directories in HDFS can be recreated with > only recovered.edits in them: > # Create table "test" > # Put into "test" > # Delete table "test" > # Create table "test" again > # Crash the regionserver to which the put has went to force the WAL replay > # Region directory in old table is recreated in new table > # hbase hbck returns inconsistency > This appears to happen due to the fact that WALs are not cleaned up once a > table is deleted and they still contain the edits from old table. I've tried > wal_roll command on the regionserver before crashing it, but it doesn't seem > to help as under some circumstances there are still WAL files around. The > only solution that works consistently is to restart regionserver before > creating the table at step 4 because that triggers log cleanup on startup: > [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)] > > Truncating a table also would be a workaround by in our case it's a no-go as > we create and delete tables in our tests which run back to back (create table > in the beginning of the test and delete in the end of the test). > A nice option in our case would be to provide hbase shell utility to force > clean up of log files manually as I realize that it's not really viable to > clean all of those up every time some table is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24189) Regionserver recreates region folders in HDFS after replaying WAL with removed table entries
[ https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111896#comment-17111896 ] Anoop Sam John commented on HBASE-24189: Will be created at region open only not at WAL split time. But what I was saying is the WAL split area will try to create this dir if not there. But it will not try to create a region dir at least. Right now we do mkdirs which will create whole dir tree even if the region or even table dir is not there > Regionserver recreates region folders in HDFS after replaying WAL with > removed table entries > > > Key: HBASE-24189 > URL: https://issues.apache.org/jira/browse/HBASE-24189 > Project: HBase > Issue Type: Bug > Components: regionserver, wal >Affects Versions: 2.2.4 > Environment: * HDFS 3.1.3 > * HBase 2.1.4 > * OpenJDK 8 >Reporter: Andrey Elenskiy >Assignee: Anoop Sam John >Priority: Major > > Under the following scenario region directories in HDFS can be recreated with > only recovered.edits in them: > # Create table "test" > # Put into "test" > # Delete table "test" > # Create table "test" again > # Crash the regionserver to which the put has went to force the WAL replay > # Region directory in old table is recreated in new table > # hbase hbck returns inconsistency > This appears to happen due to the fact that WALs are not cleaned up once a > table is deleted and they still contain the edits from old table. I've tried > wal_roll command on the regionserver before crashing it, but it doesn't seem > to help as under some circumstances there are still WAL files around. The > only solution that works consistently is to restart regionserver before > creating the table at step 4 because that triggers log cleanup on startup: > [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)] > > Truncating a table also would be a workaround by in our case it's a no-go as > we create and delete tables in our tests which run back to back (create table > in the beginning of the test and delete in the end of the test). > A nice option in our case would be to provide hbase shell utility to force > clean up of log files manually as I realize that it's not really viable to > clean all of those up every time some table is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24189) Regionserver recreates region folders in HDFS after replaying WAL with removed table entries
[ https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111819#comment-17111819 ] ramkrishna.s.vasudevan commented on HBASE-24189: bq.When we open a region, we will be creating the recovered.edits directory under that. Do we do this on region open or on WAL split start? > Regionserver recreates region folders in HDFS after replaying WAL with > removed table entries > > > Key: HBASE-24189 > URL: https://issues.apache.org/jira/browse/HBASE-24189 > Project: HBase > Issue Type: Bug > Components: regionserver, wal >Affects Versions: 2.2.4 > Environment: * HDFS 3.1.3 > * HBase 2.1.4 > * OpenJDK 8 >Reporter: Andrey Elenskiy >Assignee: Anoop Sam John >Priority: Major > > Under the following scenario region directories in HDFS can be recreated with > only recovered.edits in them: > # Create table "test" > # Put into "test" > # Delete table "test" > # Create table "test" again > # Crash the regionserver to which the put has went to force the WAL replay > # Region directory in old table is recreated in new table > # hbase hbck returns inconsistency > This appears to happen due to the fact that WALs are not cleaned up once a > table is deleted and they still contain the edits from old table. I've tried > wal_roll command on the regionserver before crashing it, but it doesn't seem > to help as under some circumstances there are still WAL files around. The > only solution that works consistently is to restart regionserver before > creating the table at step 4 because that triggers log cleanup on startup: > [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)] > > Truncating a table also would be a workaround by in our case it's a no-go as > we create and delete tables in our tests which run back to back (create table > in the beginning of the test and delete in the end of the test). > A nice option in our case would be to provide hbase shell utility to force > clean up of log files manually as I realize that it's not really viable to > clean all of those up every time some table is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24189) Regionserver recreates region folders in HDFS after replaying WAL with removed table entries
[ https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102856#comment-17102856 ] Anoop Sam John commented on HBASE-24189: What do you think about above approach [~timoha]? > Regionserver recreates region folders in HDFS after replaying WAL with > removed table entries > > > Key: HBASE-24189 > URL: https://issues.apache.org/jira/browse/HBASE-24189 > Project: HBase > Issue Type: Bug > Components: regionserver, wal >Affects Versions: 2.2.4 > Environment: * HDFS 3.1.3 > * HBase 2.1.4 > * OpenJDK 8 >Reporter: Andrey Elenskiy >Assignee: Anoop Sam John >Priority: Major > > Under the following scenario region directories in HDFS can be recreated with > only recovered.edits in them: > # Create table "test" > # Put into "test" > # Delete table "test" > # Create table "test" again > # Crash the regionserver to which the put has went to force the WAL replay > # Region directory in old table is recreated in new table > # hbase hbck returns inconsistency > This appears to happen due to the fact that WALs are not cleaned up once a > table is deleted and they still contain the edits from old table. I've tried > wal_roll command on the regionserver before crashing it, but it doesn't seem > to help as under some circumstances there are still WAL files around. The > only solution that works consistently is to restart regionserver before > creating the table at step 4 because that triggers log cleanup on startup: > [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)] > > Truncating a table also would be a workaround by in our case it's a no-go as > we create and delete tables in our tests which run back to back (create table > in the beginning of the test and delete in the end of the test). > A nice option in our case would be to provide hbase shell utility to force > clean up of log files manually as I realize that it's not really viable to > clean all of those up every time some table is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24189) Regionserver recreates region folders in HDFS after replaying WAL with removed table entries
[ https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102230#comment-17102230 ] Anoop Sam John commented on HBASE-24189: Another possible solution way would be this When we open a region, we will be creating the recovered.edits directory under that. So for the WALSplitter to write the edits file under the region, there is ideally no need to create the dirs. At least it dont need to create the region dir. But in code what we do is if the region/recovered.edits dir is not there we will create it using mkdirs. So even if region dir is not there, we will end up creating that. we can avoid doing this mkdirs. And just do INFO log and skip all edits for that region. Sounds like a less risky and simple thing (?) > Regionserver recreates region folders in HDFS after replaying WAL with > removed table entries > > > Key: HBASE-24189 > URL: https://issues.apache.org/jira/browse/HBASE-24189 > Project: HBase > Issue Type: Bug > Components: regionserver, wal >Affects Versions: 2.2.4 > Environment: * HDFS 3.1.3 > * HBase 2.1.4 > * OpenJDK 8 >Reporter: Andrey Elenskiy >Assignee: Anoop Sam John >Priority: Major > > Under the following scenario region directories in HDFS can be recreated with > only recovered.edits in them: > # Create table "test" > # Put into "test" > # Delete table "test" > # Create table "test" again > # Crash the regionserver to which the put has went to force the WAL replay > # Region directory in old table is recreated in new table > # hbase hbck returns inconsistency > This appears to happen due to the fact that WALs are not cleaned up once a > table is deleted and they still contain the edits from old table. I've tried > wal_roll command on the regionserver before crashing it, but it doesn't seem > to help as under some circumstances there are still WAL files around. The > only solution that works consistently is to restart regionserver before > creating the table at step 4 because that triggers log cleanup on startup: > [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)] > > Truncating a table also would be a workaround by in our case it's a no-go as > we create and delete tables in our tests which run back to back (create table > in the beginning of the test and delete in the end of the test). > A nice option in our case would be to provide hbase shell utility to force > clean up of log files manually as I realize that it's not really viable to > clean all of those up every time some table is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24189) Regionserver recreates region folders in HDFS after replaying WAL with removed table entries
[ https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094202#comment-17094202 ] ramkrishna.s.vasudevan commented on HBASE-24189: bq.I haven't actually tested this, it could be the same case or maybe regionserver actually checks that the table doesn't exist anymore so it doesn't create directories. The deleted table came back due to another bug but as you said more or less it is the same case. Split or merge will ensure region is closed (daughter region). So all the memstore data has to be flushed by that time. If it fails Split /mergewont proceed > Regionserver recreates region folders in HDFS after replaying WAL with > removed table entries > > > Key: HBASE-24189 > URL: https://issues.apache.org/jira/browse/HBASE-24189 > Project: HBase > Issue Type: Bug > Components: regionserver, wal >Affects Versions: 2.2.4 > Environment: * HDFS 3.1.3 > * HBase 2.1.4 > * OpenJDK 8 >Reporter: Andrey Elenskiy >Assignee: Anoop Sam John >Priority: Major > > Under the following scenario region directories in HDFS can be recreated with > only recovered.edits in them: > # Create table "test" > # Put into "test" > # Delete table "test" > # Create table "test" again > # Crash the regionserver to which the put has went to force the WAL replay > # Region directory in old table is recreated in new table > # hbase hbck returns inconsistency > This appears to happen due to the fact that WALs are not cleaned up once a > table is deleted and they still contain the edits from old table. I've tried > wal_roll command on the regionserver before crashing it, but it doesn't seem > to help as under some circumstances there are still WAL files around. The > only solution that works consistently is to restart regionserver before > creating the table at step 4 because that triggers log cleanup on startup: > [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)] > > Truncating a table also would be a workaround by in our case it's a no-go as > we create and delete tables in our tests which run back to back (create table > in the beginning of the test and delete in the end of the test). > A nice option in our case would be to provide hbase shell utility to force > clean up of log files manually as I realize that it's not really viable to > clean all of those up every time some table is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24189) Regionserver recreates region folders in HDFS after replaying WAL with removed table entries
[ https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094198#comment-17094198 ] Anoop Sam John commented on HBASE-24189: bq.For example, if region is split or merged and WAL isn't flushed prior to opening child region and closing parent regions I dont think so. For split or merge, the Region has to be closed 1st which will flush the data. Splitting or merging of regions with out flushing memstore is not possible. I did not check the code flow though. > Regionserver recreates region folders in HDFS after replaying WAL with > removed table entries > > > Key: HBASE-24189 > URL: https://issues.apache.org/jira/browse/HBASE-24189 > Project: HBase > Issue Type: Bug > Components: regionserver, wal >Affects Versions: 2.2.4 > Environment: * HDFS 3.1.3 > * HBase 2.1.4 > * OpenJDK 8 >Reporter: Andrey Elenskiy >Assignee: Anoop Sam John >Priority: Major > > Under the following scenario region directories in HDFS can be recreated with > only recovered.edits in them: > # Create table "test" > # Put into "test" > # Delete table "test" > # Create table "test" again > # Crash the regionserver to which the put has went to force the WAL replay > # Region directory in old table is recreated in new table > # hbase hbck returns inconsistency > This appears to happen due to the fact that WALs are not cleaned up once a > table is deleted and they still contain the edits from old table. I've tried > wal_roll command on the regionserver before crashing it, but it doesn't seem > to help as under some circumstances there are still WAL files around. The > only solution that works consistently is to restart regionserver before > creating the table at step 4 because that triggers log cleanup on startup: > [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)] > > Truncating a table also would be a workaround by in our case it's a no-go as > we create and delete tables in our tests which run back to back (create table > in the beginning of the test and delete in the end of the test). > A nice option in our case would be to provide hbase shell utility to force > clean up of log files manually as I realize that it's not really viable to > clean all of those up every time some table is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24189) Regionserver recreates region folders in HDFS after replaying WAL with removed table entries
[ https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093922#comment-17093922 ] Andrey Elenskiy commented on HBASE-24189: - Haven't planned on a patch, not exactly certain what would be the right solution here without leading to data loss. > But what if the table is deleted and not recreated. I haven't actually tested this, it could be the same case or maybe regionserver actually checks that the table doesn't exist anymore so it doesn't create directories. > We might have to check whether the region exists or not also as part of the > last flushed seqId look up and if the regions does not exists at all, we > might have to just ignore those entries from WAL. Would this be a safe thing to do? I'm not familiar with edge cases, but what would happen if WAL isn't flushed before the region is removed, it might cause data loss? For example, if region is split or merged and WAL isn't flushed prior to opening child region and closing parent regions (I don't know if it always gets flushed in those cases), then GCRegionProcedure will remove the parent regions and if there are still edits in WAL for parent regions that should be replayed into child region instead of getting discarded. > Regionserver recreates region folders in HDFS after replaying WAL with > removed table entries > > > Key: HBASE-24189 > URL: https://issues.apache.org/jira/browse/HBASE-24189 > Project: HBase > Issue Type: Bug > Components: regionserver, wal >Affects Versions: 2.2.4 > Environment: * HDFS 3.1.3 > * HBase 2.1.4 > * OpenJDK 8 >Reporter: Andrey Elenskiy >Assignee: Anoop Sam John >Priority: Major > > Under the following scenario region directories in HDFS can be recreated with > only recovered.edits in them: > # Create table "test" > # Put into "test" > # Delete table "test" > # Create table "test" again > # Crash the regionserver to which the put has went to force the WAL replay > # Region directory in old table is recreated in new table > # hbase hbck returns inconsistency > This appears to happen due to the fact that WALs are not cleaned up once a > table is deleted and they still contain the edits from old table. I've tried > wal_roll command on the regionserver before crashing it, but it doesn't seem > to help as under some circumstances there are still WAL files around. The > only solution that works consistently is to restart regionserver before > creating the table at step 4 because that triggers log cleanup on startup: > [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)] > > Truncating a table also would be a workaround by in our case it's a no-go as > we create and delete tables in our tests which run back to back (create table > in the beginning of the test and delete in the end of the test). > A nice option in our case would be to provide hbase shell utility to force > clean up of log files manually as I realize that it's not really viable to > clean all of those up every time some table is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24189) Regionserver recreates region folders in HDFS after replaying WAL with removed table entries
[ https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093324#comment-17093324 ] Anoop Sam John commented on HBASE-24189: bq.Region directory in old table is recreated in new table...hbase hbck returns inconsistency While doing the WAL split, we write the recovered edits file under the region dir and this recreates the dir from the old deleted table. Now while doing the WAL split, we decide whether to consider each of the WAL entries by comparing its seqId against the latest flushed seqId for this region (This info is getting from HM). Now if the old table is deleted and so old region, the HM will give back -1 as latest seqId and all entries will be considered for adding to recovered edits file. We might have to check whether the region exists or not also as part of the last flushed seqId look up and if the regions does not exists at all, we might have to just ignore those entries from WAL. The recovered edits will be under a path /datarecovered.edits/ In fact this is another issue also. In your steps, you create the table with same name. But what if the table is deleted and not recreated. And immediately there was an RS down. Now the replay of the WAL file could possibly bring back some of the data from the deleted table back into the filesystem. Also these data wont get replayed as these regions are gone anyways and so this data wont get deleted until some one manually remove these. Are you planning for a patch for this issue? > Regionserver recreates region folders in HDFS after replaying WAL with > removed table entries > > > Key: HBASE-24189 > URL: https://issues.apache.org/jira/browse/HBASE-24189 > Project: HBase > Issue Type: Bug > Components: regionserver, wal >Affects Versions: 2.2.4 > Environment: * HDFS 3.1.3 > * HBase 2.1.4 > * OpenJDK 8 >Reporter: Andrey Elenskiy >Priority: Minor > > Under the following scenario region directories in HDFS can be recreated with > only recovered.edits in them: > # Create table "test" > # Put into "test" > # Delete table "test" > # Create table "test" again > # Crash the regionserver to which the put has went to force the WAL replay > # Region directory in old table is recreated in new table > # hbase hbck returns inconsistency > This appears to happen due to the fact that WALs are not cleaned up once a > table is deleted and they still contain the edits from old table. I've tried > wal_roll command on the regionserver before crashing it, but it doesn't seem > to help as under some circumstances there are still WAL files around. The > only solution that works consistently is to restart regionserver before > creating the table at step 4 because that triggers log cleanup on startup: > [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)] > > Truncating a table also would be a workaround by in our case it's a no-go as > we create and delete tables in our tests which run back to back (create table > in the beginning of the test and delete in the end of the test). > A nice option in our case would be to provide hbase shell utility to force > clean up of log files manually as I realize that it's not really viable to > clean all of those up every time some table is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)