[ 
https://issues.apache.org/jira/browse/HBASE-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261906#comment-14261906
 ] 

Rajeshbabu Chintaguntla commented on HBASE-12791:
-------------------------------------------------

To solve this we need not skip the rollback even if the regionserver stopping 
so that the the dirty daughter regions will be deleted(if rs is stopping we 
need not open parent region in rollback). Some times if regionserver terminated 
abnormally then we may not call rollback so we need to delete the directories 
in SSH if any regions in SPLITTING_NEW state.
And also while fixing meta holes in hbck we should check for overlaps 
beforehand and need not add them to meta in such case. Now in this scenario 
both daugher regions start and end keys are in the range of existing parent 
region in meta so we need not add them to meta and also we can consider the 
directories what ever left are because of aborted splits and we can delete them.


> HBase does not attempt to clean up an aborted split when the regionserver 
> shutting down
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-12791
>                 URL: https://issues.apache.org/jira/browse/HBASE-12791
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.0
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.10, 1.0.1
>
>
> HBase not cleaning the daughter region directories from HDFS  if region 
> server shut down after creating the daughter region directories during the 
> split.
> Here the logs.
> -> RS shutdown after creating the daughter regions.
> {code}
> 2014-12-31 09:05:41,406 DEBUG [regionserver60020-splits-1419996941385] 
> zookeeper.ZKAssign: regionserver:60020-0x14a9701e53100d1, 
> quorum=localhost:2181, baseZNode=/hbase Transitioned node 
> 80c665138d4fa32da4d792d8ed13206f from RS_ZK_REQUEST_REGION_SPLIT to 
> RS_ZK_REQUEST_REGION_SPLIT
> 2014-12-31 09:05:41,514 DEBUG [regionserver60020-splits-1419996941385] 
> regionserver.HRegion: Closing 
> t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.: disabling compactions & 
> flushes
> 2014-12-31 09:05:41,514 DEBUG [regionserver60020-splits-1419996941385] 
> regionserver.HRegion: Updates disabled for region 
> t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.
> 2014-12-31 09:05:41,516 INFO  
> [StoreCloserThread-t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.-1] 
> regionserver.HStore: Closed f
> 2014-12-31 09:05:41,518 INFO  [regionserver60020-splits-1419996941385] 
> regionserver.HRegion: Closed 
> t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.
> 2014-12-31 09:05:49,922 DEBUG [regionserver60020-splits-1419996941385] 
> regionserver.MetricsRegionSourceImpl: Creating new MetricsRegionSourceImpl 
> for table t dd9731ee43b104da565257ca1539aa8c
> 2014-12-31 09:05:49,922 DEBUG [regionserver60020-splits-1419996941385] 
> regionserver.HRegion: Instantiated 
> t,,1419996941401.dd9731ee43b104da565257ca1539aa8c.
> 2014-12-31 09:05:49,929 DEBUG [regionserver60020-splits-1419996941385] 
> regionserver.MetricsRegionSourceImpl: Creating new MetricsRegionSourceImpl 
> for table t 2e40a44511c0e187d357d651f13a1dab
> 2014-12-31 09:05:49,929 DEBUG [regionserver60020-splits-1419996941385] 
> regionserver.HRegion: Instantiated 
> t,row2,1419996941401.2e40a44511c0e187d357d651f13a1dab.
> Wed Dec 31 09:06:30 IST 2014 Terminating regionserver
> 2014-12-31 09:06:30,465 INFO  [Thread-8] regionserver.ShutdownHook: Shutdown 
> hook starting; hbase.shutdown.hook=true; 
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@42d2282e
> {code}
> -> Skipping rollback if RS stopped or stopping so we end up in dirty daughter 
> regions in HDFS.
> {code}
> 2014-12-31 09:07:49,547 INFO  [regionserver60020-splits-1419996941385] 
> regionserver.SplitRequest: Skip rollback/cleanup of failed split of 
> t,,1419996880699.80c665138d4fa32da4d792d8ed13206f. because server is stopped
> java.io.InterruptedIOException: Interrupted after 0 tries  on 350
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:156)
> {code}
> Because of this hbck always showing inconsistencies. 
> {code}
> ERROR: Region { meta => null, hdfs => 
> hdfs://localhost:9000/hbase/data/default/t/2e40a44511c0e187d357d651f13a1dab, 
> deployed =>  } on HDFS, but not listed in hbase:meta or deployed on any 
> region server
> ERROR: Region { meta => null, hdfs => 
> hdfs://localhost:9000/hbase/data/default/t/dd9731ee43b104da565257ca1539aa8c, 
> deployed =>  } on HDFS, but not listed in hbase:meta or deployed on any 
> region server
> {code}
> If we try to repair then we end up in overlap regions in hbase:meta. and both 
> daughter regions and parent are online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to