[ 
https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093922#comment-17093922
 ] 

Andrey Elenskiy commented on HBASE-24189:
-----------------------------------------

Haven't planned on a patch, not exactly certain what would be the right 
solution here without leading to data loss.

> But what if the table is deleted and not recreated. 
I haven't actually tested this, it could be the same case or maybe regionserver 
actually checks that the table doesn't exist anymore so it doesn't create 
directories.

> We might have to check whether the region exists or not also as part of the 
> last flushed seqId look up and if the regions does not exists at all, we 
> might have to just ignore those entries from WAL.
Would this be a safe thing to do? I'm not familiar with edge cases, but what 
would happen if WAL isn't flushed before the region is removed, it might cause 
data loss? For example, if region is split or merged and WAL isn't flushed 
prior to opening child region and closing parent regions (I don't know if it 
always gets flushed in those cases), then GCRegionProcedure will remove the 
parent regions and if there are still edits in WAL for parent regions that 
should be replayed into child region instead of getting discarded.

> Regionserver recreates region folders in HDFS after replaying WAL with 
> removed table entries
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-24189
>                 URL: https://issues.apache.org/jira/browse/HBASE-24189
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, wal
>    Affects Versions: 2.2.4
>         Environment: * HDFS 3.1.3
>  * HBase 2.1.4
>  * OpenJDK 8
>            Reporter: Andrey Elenskiy
>            Assignee: Anoop Sam John
>            Priority: Major
>
> Under the following scenario region directories in HDFS can be recreated with 
> only recovered.edits in them:
>  # Create table "test"
>  # Put into "test"
>  # Delete table "test"
>  # Create table "test" again
>  # Crash the regionserver to which the put has went to force the WAL replay
>  # Region directory in old table is recreated in new table
>  # hbase hbck returns inconsistency
> This appears to happen due to the fact that WALs are not cleaned up once a 
> table is deleted and they still contain the edits from old table. I've tried 
> wal_roll command on the regionserver before crashing it, but it doesn't seem 
> to help as under some circumstances there are still WAL files around. The 
> only solution that works consistently is to restart regionserver before 
> creating the table at step 4 because that triggers log cleanup on startup: 
> [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)]
>  
> Truncating a table also would be a workaround by in our case it's a no-go as 
> we create and delete tables in our tests which run back to back (create table 
> in the beginning of the test and delete in the end of the test).
> A nice option in our case would be to provide hbase shell utility to force 
> clean up of log files manually as I realize that it's not really viable to 
> clean all of those up every time some table is removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to