[ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jerry He updated HBASE-12949: ----------------------------- Attachment: HBASE-12949-master.patch Attached a patch to see if you folks are ok with the approach. Here is what would show up after the patch with the bad hfile. In the region server log, which a aborted compaction: {noformat} 2015-02-04 13:57:39,077 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction failed Request = regionName=CUMMINS_INSITE_V1,20110207-105743558-21939316-1406327439200524000,1422756983729.bc8e4b3996d2424f21dc0cfdcd422a6b., storeName=attributes, fileCount=1, fileSize=6.4 G (6.4 G), priority=9, time=1423087053717124000 java.io.IOException: Could not iterate StoreFileScanner[org.apache.hadoop.hbase.io.HalfStoreFileReader$1@714e714e, cur=20110208-080219433-21950204-1397112048924811000/attributes:1015319_1010319/1397120694918/Put/vlen=15/mvcc=0] at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:142) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:507) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:77) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:110) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1099) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1483) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:506) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:906) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:929) at java.lang.Thread.run(Thread.java:738) Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Invalid type 0 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.getKeyValue(HFileReaderV2.java:695) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.getKeyValue(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:137) ... 11 more {noformat} Doing a get from the shell: {noformat} hbase(main):002:0> get 'CUMMINS_INSITE_V1', '20110208-080219433-21950204-1397112048924811000' COLUMN CELL ERROR: java.io.IOException: Could not iterate StoreFileScanner[org.apache.hadoop.hbase.io.HalfStoreFileReader$1@70837083, cur=20110208-080219433-21950204-1397112048924811000/attributes:1015319_1010319/1397120694918/Put/vlen=15/mvcc=0] at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:142) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:507) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3992) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4072) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3950) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3919) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3906) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4882) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4856) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2951) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29937) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:110) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:90) at java.lang.Thread.run(Thread.java:738) Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Invalid type 0 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.getKeyValue(HFileReaderV2.java:695) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.getKeyValue(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:137) ... 17 more {noformat} > Scanner can be stuck in infinite loop if the HFile is corrupted > --------------------------------------------------------------- > > Key: HBASE-12949 > URL: https://issues.apache.org/jira/browse/HBASE-12949 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.3, 0.98.10 > Reporter: Jerry He > Attachments: HBASE-12949-master.patch > > > We've encountered problem where compaction hangs and never completes. > After looking into it further, we found that the compaction scanner was stuck > in a infinite loop. See stack below. > {noformat} > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) > org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) > org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) > {noformat} > We identified the hfile that seems to be corrupted. Using HFile tool shows > the following: > {noformat} > [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k > -m -f > /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 > 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is > deprecated. Instead, use io.native.lib.available > 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using > org.apache.hadoop.util.PureJavaCrc32 > 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use > org.apache.hadoop.util.PureJavaCrc32C > 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is > deprecated. Instead, use fs.defaultFS > Scanning -> > /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 > WARNING, previous row is greater then current row > filename -> > /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 > previous -> > \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 > current -> > Exception in thread "main" java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:489) > at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) > at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) > {noformat} > Turning on Java Assert shows the following: > {noformat} > Exception in thread "main" java.lang.AssertionError: Key > 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0 > followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes > at > org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672) > {noformat} > It shows that the hfile seems to be corrupted -- the keys don't seem to be > right. > But Scanner is not able to give a meaningful error, but stuck in an infinite > loop in here: > {code} > KeyValueHeap.generalizedSeek() > while ((scanner = heap.poll()) != null) { > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)