[ 
https://issues.apache.org/jira/browse/HBASE-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009219#comment-15009219
 ] 

Clara Xiong commented on HBASE-14820:
-------------------------------------

2015-11-16 07:26:43,434 DEBUG [MemStoreFlusher.1] 
regionserver.CompactSplitThread: Split requested for 
[TableName],\x85\xDF`\x7F\xFF\xFE\xCE 
,\xB6{\xD0\xA5\x15'\xD7\xD6g\x8E_k\xB5\x13\x16g%\x9D\x06\xC5q
\xCB!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f34dfd31a..  
compaction_queue=(0:0), split_queue=0, merge_queue=0
2015-11-16 07:26:43,788 DEBUG [regionserver60020-splits-1447687603434] 
lock.ZKInterProcessLockBase: Acquired a lock for 
/hbase/table-lock/[TableName]/read-regionserver:600200000009873
2015-11-16 07:26:43,789 INFO  [regionserver60020-splits-1447687603434] 
regionserver.SplitTransaction: Starting split of region 
[TableName],\x85\xDF`\x7F\xFF\xFE\xCE ,\xB6{\xD0\xA5\x15'\xD7\xD6g\x8E_k\xB
5\x13\x16g%\x9D\x06\xC5q\xCB!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f34dfd31a.
2015-11-16 07:26:43,789 DEBUG [regionserver60020-splits-1447687603434] 
regionserver.SplitTransaction: regionserver:60020-0x1503f523d9f032a, 
quorum=hm4.[Host]:2181,hm3.[Host]:
2181,hm2.[Host]:2181,hm1.[Host]:2181,hm5.[Host]:2181, baseZNode=/hbase Creating 
ephemeral node for 4c467284476f9627dd8e7a4f34dfd31a in PENDING_SPLIT state
2015-11-16 07:26:43,790 DEBUG [regionserver60020-splits-1447687603434] 
regionserver.SplitTransaction: Still waiting for master to process the 
pending_split for 4c467284476f9627dd8e7a4f34dfd31a
2015-11-16 07:26:43,790 DEBUG [regionserver60020-splits-1447687603434] 
zookeeper.ZKAssign: regionserver:60020-0x1503f523d9f032a, 
quorum=hm4.[Host]:2181,hm3.[Host]:2181,hm2.[Host][Host]:2181,hm1.[Host]:2181,hm5.[Host]:2181,
 baseZNode=/hbase Transitioning 4c467284476f9627dd8e7a4f34dfd31a from 
RS_ZK_REQUEST_REGION_SPLIT to RS_ZK_REQUEST_REGI
ON_SPLIT
2015-11-16 07:26:43,791 DEBUG [regionserver60020-splits-1447687603434] 
zookeeper.ZKAssign: regionserver:60020-0x1503f523d9f032a, 
quorum=hm4.[Host]:2181,hm3.[Host]:2181,hm2.[Host]:2181,hm1.[Host]:2181,hm5.[Host]:2181,
 baseZNode=/hbase Transitioned node 4c467284476f9627dd8e7a4f34dfd31a from 
RS_ZK_REQUEST_REGION_SPLIT to RS_ZK_REQUEST_
REGION_SPLIT
2015-11-16 07:26:43,986 DEBUG [regionserver60020-splits-1447687603434] 
regionserver.HRegion: Closing [TableName],\x85\xDF`\x7F\xFF\xFE\xCE 
,\xB6{\xD0\xA5\x15'\xD7\xD6g\x8E_k\xB5\x13\x16g%\x9D\x06\xC5q\x
CB!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f34dfd31a.: disabling 
compactions & flushes
2015-11-16 07:26:43,986 DEBUG [regionserver60020-splits-1447687603434] 
regionserver.HRegion: Updates disabled for region 
[TableName],\x85\xDF`\x7F\xFF\xFE\xCE 
,\xB6{\xD0\xA5\x15'\xD7\xD6g\x8E_k\xB5\x13\
x16g%\x9D\x06\xC5q\xCB!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f34dfd31a.
2015-11-16 07:26:43,986 INFO  [regionserver60020-splits-1447687603434] 
regionserver.HRegion: Started memstore flush for 
[TableName],\x85\xDF`\x7F\xFF\xFE\xCE 
,\xB6{\xD0\xA5\x15'\xD7\xD6g\x8E_k\xB5\x13\x
16g%\x9D\x06\xC5q\xCB!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f34dfd31a.,
 current region memstore size 363.0 K
2015-11-16 07:26:44,527 INFO  [regionserver60020-splits-1447687603434] 
regionserver.DefaultStoreFlusher: Flushed, sequenceid=12742653, memsize=363.0 
K, hasBloomFilter=false, into tmp file 
hdfs://hm1.[Host]]:9000/hbase/data/default/[TableName]/4c467284476f9627dd8e7a4f34dfd31a/.tmp/fe0ca3b83b794cef8c1cb357ccc08db6
2015-11-16 07:26:44,532 DEBUG [regionserver60020-splits-1447687603434] 
regionserver.HRegionFileSystem: Committing store file 
hdfs://hm1.[Host]:9000/hbase/data/default/[TableName]/4c4672844
76f9627dd8e7a4f34dfd31a/.tmp/fe0ca3b83b794cef8c1cb357ccc08db6 as 
hdfs://hm1.[Host]:9000/hbase/data/default/[TableName]/4c467284476f9627dd8e7a4f34dfd31a/e/fe0ca3b83b794cef8c1cb357ccc08db6
2015-11-16 07:26:44,536 INFO  [regionserver60020-splits-1447687603434] 
regionserver.HStore: Added 
hdfs://hm1.[Host]:9000/hbase/data/default/[TableName]/4c467284476f9627dd8e7a4f34dfd31a/e/f
e0ca3b83b794cef8c1cb357ccc08db6, entries=368, sequenceid=12742653, 
filesize=82.7 K
2015-11-16 07:26:44,536 INFO  [regionserver60020-splits-1447687603434] 
regionserver.HRegion: Finished memstore flush of ~363.0 K/371744, 
currentsize=0/0 for region [TableName],\x85\xDF`\x7F\xFF\xFE\xCE 
,\xB6{\xD0\xA5\x15'\xD7\xD6g\x8E_k\xB5\x13\x16g%\x9D\x06\xC5q\xCB!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f34dfd31a.
 in 550ms, sequenceid=12742653, compaction requested=false
2015-11-16 07:26:44,672 INFO  
[StoreCloserThread-[TableName],\x85\xDF`\x7F\xFF\xFE\xCE 
,\xB6{\xD0\xA5\x15'\xD7\xD6g\x8E_k\xB5\x13\x16g%\x9D\x06\xC5q\xCB!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f3
4dfd31a.-1] regionserver.HStore: Closed e
2015-11-16 07:26:44,673 INFO  [regionserver60020-splits-1447687603434] 
regionserver.HRegion: Closed [TableName],\x85\xDF`\x7F\xFF\xFE\xCE 
,\xB6{\xD0\xA5\x15'\xD7\xD6g\x8E_k\xB5\x13\x16g%\x9D\x06\xC5q\xC
B!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f34dfd31a.
2015-11-16 07:26:44,673 INFO  [regionserver60020-splits-1447687603434] 
regionserver.SplitTransaction: Preparing to split 1 storefiles for region 
[TableName],\x85\xDF`\x7F\xFF\xFE\xCE ,\xB6{\xD0\xA5\x15'
\xD7\xD6g\x8E_k\xB5\x13\x16g%\x9D\x06\xC5q\xCB!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f34dfd31a.
2015-11-16 07:27:14,725 INFO  [regionserver60020-splits-1447687603434] 
regionserver.SplitRequest: Running rollback/cleanup of failed split of 
[TableName],\x85\xDF`\x7F\xFF\xFE\xCE ,\xB6{\xD0\xA5\x15'\xD
7\xD6g\x8E_k\xB5\x13\x16g%\x9D\x06\xC5q\xCB!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f34dfd31a.;
 Took too long to split the files and create the references, aborting split
java.io.IOException: Took too long to split the files and create the 
references, aborting split
        at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.splitStoreFiles(SplitTransaction.java:855)
        at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.stepsBeforePONR(SplitTransaction.java:433)
        at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:307)
        at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:676)
        at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:85)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2015-11-16 07:27:14,823 DEBUG [regionserver60020-splits-1447687603434] 
regionserver.HRegion: Found 0 recovered edits file(s) under 
hdfs://hm1.[Host]:9000/hbase/data/default/[TableName]/4c467284476f9627dd8e7a4f34dfd31a
2015-11-16 07:27:14,824 WARN  [regionserver60020-splits-1447687603434] 
zookeeper.ZKAssign: regionserver:60020-0x1503f523d9f032a, 
quorum=hm4.[Host]:2181,hm3.[Host]:2181,hm2.[Host]:2181,hm1.[Host]:2181,hm5.[Host]:2181,
 baseZNode=/hbase Attempting to delete unassigned node 
4c467284476f9627dd8e7a4f34dfd31a in RS_ZK_REQUEST_REGION_SPLIT state but node 
is in RS_ZK_REGION_SPLITTING state
2015-11-16 07:27:14,825 DEBUG [regionserver60020-splits-1447687603434] 
zookeeper.ZKAssign: regionserver:60020-0x1503f523d9f032a, 
quorum=hm4.[Host]:2181,hm3.[Host]:2181,hm2.[Host]:2181,hm1.[Host]:2181,hm5.[Host]:2181,
 baseZNode=/hbase Deleted unassigned node 4c467284476f9627dd8e7a4f34dfd31a in 
expected state RS_ZK_REGION_SPLITTING
2015-11-16 07:27:14,825 INFO  [regionserver60020-splits-1447687603434] 
regionserver.SplitRequest: Successful rollback of failed split of 
[TableName],\x85\xDF`\x7F\xFF\xFE\xCE 
,\xB6{\xD0\xA5\x15'\xD7\xD6g\x8E_k\xB5\x13\x16g%\x9D\x06\xC5q\xCB!\xBFI\x01,1447477545041.4c467284476f9627dd8e7a4f34dfd31a.
2015-11-16 07:27:14,827 DEBUG [regionserver60020-splits-1447687603434] 
lock.ZKInterProcessLockBase: Released 
/hbase/table-lock/[TableName]/read-regionserver:600200000009873
2015-11-16 07:27:14,827 INFO  [regionserver60020-splits-1447687603434] 
regionserver.SplitRequest: Split transaction journal:
        STARTED at 1447687603661
        PREPARED at 1447687603788
        BEFORE_PRE_SPLIT_HOOK at 1447687603789
        AFTER_PRE_SPLIT_HOOK at 1447687603789
        SET_SPLITTING_IN_ZK at 1447687603790
        CREATE_SPLIT_DIR at 1447687603986
        CLOSED_PARENT_REGION at 1447687604673
        OFFLINED_PARENT at 1447687604673
2015-11-16 07:28:05,959 INFO  [Thread-20670827] hdfs.DFSClient: Exception in 
createBlockOutputStream
java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.249.198.246:45824 remote=/10.249.198.246:50010]
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)
        at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2201)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1439)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
2015-11-16 07:28:05,959 INFO  [Thread-20670827] hdfs.DFSClient: Abandoning 
BP-17464394-10.249.192.41-1437430130467:blk_1203771348_130031565
2015-11-16 07:28:06,517 WARN  [Thread-20670827] hdfs.DFSClient: DataStreamer 
Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/hbase/data/default/[TableName]/4c467284476f9627dd8e7a4f34dfd31a/.splits/f0f72e76c9079476df6cf16a57086beb/e/b866441ce0284a6b9e794270ae861359.4c467284476f9627dd8e7a4f34dfd31a
 (inode 153365501): File does not exist. [Lease.  Holder: 
DFSClient_hb_rs_hs1526.[Host],60020,1445544584894_1032217520_45, 
pendingcreates: 1]
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3549)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.abandonBlock(FSNamesystem.java:3519)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.abandonBlock(NameNodeRpcServer.java:709)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.abandonBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:467)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090)

        at org.apache.hadoop.ipc.Client.call(Client.java:1468)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy16.abandonBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.abandonBlock(ClientNamenodeProtocolTranslatorPB.java:376)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy17.abandonBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294)
        at com.sun.proxy.$Proxy18.abandonBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1365)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
2015-11-16 07:28:06,517 WARN  [Thread-20670827] hdfs.DFSClient: Could not get 
block locations. Source file 
"/hbase/data/default/[TableName]/4c467284476f9627dd8e7a4f34dfd31a/.splits/f0f72e76c9079476df6cf16a57086beb/e/b866441ce0284a6b9e794270ae861359.4c467284476f9627dd8e7a4f34dfd31a"
 - Aborting...

2015-11-16 09:58:02,031 WARN  
[regionserver60020-smallCompactions-1447694140073] hdfs.BlockReaderFactory: I/O 
error constructing remote block reader.
java.io.IOException: Got error for OP_READ_BLOCK, self=/10.249.198.246:18432, 
remote=/10.249.194.239:50010, for file 
/hbase/data/default/[TableName]/da91c5a2ddf7ac8350a0555961fa793c/e/c245ebf9c399492ea6
0eef9fba98a52b, for pool BP-17464394-10.249.192.41-1437430130467 block 
1203905906_130166129
        at 
org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:445)
        at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:410)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:796)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:674)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
        at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
        at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1312)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1581)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1413)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:394)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:810)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:244)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:329)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:252)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:214)
        at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:299)
        at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:87)
        at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
        at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1113)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1526)
        at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:498)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


> Region becomes unavailable after a region split is rolled back
> --------------------------------------------------------------
>
>                 Key: HBASE-14820
>                 URL: https://issues.apache.org/jira/browse/HBASE-14820
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.98.15
>            Reporter: Clara Xiong
>
> After the region server rolls back a timed out attempt of  region split, the 
> region becomes unavailable. 
> Symptoms:
> The RS displays the region open in the web UI.
> The meta table still points to the RS
> Requests for the regions receive a NotServingRegionException. 
> hbck reports 0 inconsistencies. 
> Moving the region fails. 
> Restarting the region server fixes the problem.
> We have see multiple occurrences which require operation intervention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to