Re: scan 引发未捕获运行时异常导致RS宕机

2023-01-27 Thread Duo Zhang
这个就是兜底用的,如果发生了说明代码有 bug

leojie  于2023年1月28日周六 11:14写道:
>
> 感谢张老师的回复,再请教一下,BlockCacheUtil类中的validateBlockAddition方法内,
>
> public static int validateBlockAddition(Cacheable existing, Cacheable 
> newBlock,
>   BlockCacheKey cacheKey) {
>   int comparison = compareCacheBlock(existing, newBlock, false);
>   if (comparison != 0) {
> throw new RuntimeException(
>   "Cached block contents differ, which should not have happened."
> + "cacheKey:" + cacheKey);
>   }
>   if ((existing instanceof HFileBlock) && (newBlock instanceof HFileBlock)) {
> comparison = ((HFileBlock) existing).getNextBlockOnDiskSize()
>   - ((HFileBlock) newBlock).getNextBlockOnDiskSize();
>   }
>   return comparison;
> }
>
> 这里comparison != 0这个判断逻辑是必须的?
>
>
> 张铎(Duo Zhang)  于2023年1月28日周六 11:09写道:
>
> > 建议先升级到2.4或者2.5的最新版试试?2.2.6 已经是比较老的版本了,后面 BucketCache 修过不少小 bug
> >
> > https://issues.apache.org/jira/browse/HBASE-26281
> >
> > 比如这个就有可能导致BucketCache里的内容错乱,出现各种奇怪的错误
> >
> > leojie  于2023年1月28日周六 10:22写道:
> >
> > >
> > > Hi all,
> > > 向社区求助一个HBase的scan异常导致RS宕机的问题,异常日志如下:
> > >
> > > 2023-01-19 03:19:06,986 ERROR
> > > [RpcServer.default.RWQ.Fifo.scan.handler=226,queue=19,port=60020]
> > > ipc.RpcServer: Unexpected throwable object
> > > java.lang.RuntimeException: Cached block contents differ, which should
> > not
> > > have happened.cacheKey:bbec4ed53b6d475cbb8711f183556eb0_14145152
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:205)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:237)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:432)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:417)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.bucket.BucketCache.cacheBlock(BucketCache.java:403)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:68)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1528)
> > > at java.util.Optional.ifPresent(Optional.java:159)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1526)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:928)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1061)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1055)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1073)
> > > at
> > > org.apache.hadoop.hbase.io
> > .hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1094)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:351)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:244)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:324)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:267)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:1099)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:1088)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:823)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:730)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:157)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6681)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6845)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6615)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3238)
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3483)
> > > at
> > >
> > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
> > > at 

Re: scan 引发未捕获运行时异常导致RS宕机

2023-01-27 Thread leojie
感谢张老师的回复,再请教一下,BlockCacheUtil类中的validateBlockAddition方法内,

public static int validateBlockAddition(Cacheable existing, Cacheable newBlock,
  BlockCacheKey cacheKey) {
  int comparison = compareCacheBlock(existing, newBlock, false);
  if (comparison != 0) {
throw new RuntimeException(
  "Cached block contents differ, which should not have happened."
+ "cacheKey:" + cacheKey);
  }
  if ((existing instanceof HFileBlock) && (newBlock instanceof HFileBlock)) {
comparison = ((HFileBlock) existing).getNextBlockOnDiskSize()
  - ((HFileBlock) newBlock).getNextBlockOnDiskSize();
  }
  return comparison;
}

这里comparison != 0这个判断逻辑是必须的?


张铎(Duo Zhang)  于2023年1月28日周六 11:09写道:

> 建议先升级到2.4或者2.5的最新版试试?2.2.6 已经是比较老的版本了,后面 BucketCache 修过不少小 bug
>
> https://issues.apache.org/jira/browse/HBASE-26281
>
> 比如这个就有可能导致BucketCache里的内容错乱,出现各种奇怪的错误
>
> leojie  于2023年1月28日周六 10:22写道:
>
> >
> > Hi all,
> > 向社区求助一个HBase的scan异常导致RS宕机的问题,异常日志如下:
> >
> > 2023-01-19 03:19:06,986 ERROR
> > [RpcServer.default.RWQ.Fifo.scan.handler=226,queue=19,port=60020]
> > ipc.RpcServer: Unexpected throwable object
> > java.lang.RuntimeException: Cached block contents differ, which should
> not
> > have happened.cacheKey:bbec4ed53b6d475cbb8711f183556eb0_14145152
> > at
> > org.apache.hadoop.hbase.io
> .hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:205)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:237)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:432)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:417)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.bucket.BucketCache.cacheBlock(BucketCache.java:403)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:68)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1528)
> > at java.util.Optional.ifPresent(Optional.java:159)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1526)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:928)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1061)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1055)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1073)
> > at
> > org.apache.hadoop.hbase.io
> .hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1094)
> > at
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:351)
> > at
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:244)
> > at
> >
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> > at
> >
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:324)
> > at
> >
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:267)
> > at
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:1099)
> > at
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:1088)
> > at
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:823)
> > at
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:730)
> > at
> >
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:157)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6681)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6845)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6615)
> > at
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3238)
> > at
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3483)
> > at
> >
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:379)
> > at
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> > at
> > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> > at
> > 

Re: scan 引发未捕获运行时异常导致RS宕机

2023-01-27 Thread Duo Zhang
建议先升级到2.4或者2.5的最新版试试?2.2.6 已经是比较老的版本了,后面 BucketCache 修过不少小 bug

https://issues.apache.org/jira/browse/HBASE-26281

比如这个就有可能导致BucketCache里的内容错乱,出现各种奇怪的错误

leojie  于2023年1月28日周六 10:22写道:

>
> Hi all,
> 向社区求助一个HBase的scan异常导致RS宕机的问题,异常日志如下:
>
> 2023-01-19 03:19:06,986 ERROR
> [RpcServer.default.RWQ.Fifo.scan.handler=226,queue=19,port=60020]
> ipc.RpcServer: Unexpected throwable object
> java.lang.RuntimeException: Cached block contents differ, which should not
> have happened.cacheKey:bbec4ed53b6d475cbb8711f183556eb0_14145152
> at
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:205)
> at
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:237)
> at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:432)
> at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:417)
> at
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:403)
> at
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:68)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1528)
> at java.util.Optional.ifPresent(Optional.java:159)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1526)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:928)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1061)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1055)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1073)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1094)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:351)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:244)
> at
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:324)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:267)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:1099)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:1088)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:823)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:730)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:157)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6681)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6845)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6615)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3238)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3483)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:379)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
>
> 对应的源码(2.2.6):
>
> public static int validateBlockAddition(Cacheable existing, Cacheable 
> newBlock,
>   BlockCacheKey cacheKey) {
>   int comparison = compareCacheBlock(existing, newBlock, false);
>   if (comparison != 0) {
> throw new RuntimeException(
>   "Cached block contents differ, which should not have happened."
> + "cacheKey:" + cacheKey);
>   }
>   if ((existing instanceof HFileBlock) && (newBlock instanceof HFileBlock)) {
> comparison = ((HFileBlock) existing).getNextBlockOnDiskSize()
>   - ((HFileBlock) newBlock).getNextBlockOnDiskSize();
>   }
>   return comparison;
> }
>
>
> scan操作的缓存置换过程中,触发了一个未捕获的RunTime异常,导致RS突然宕机,请教一下,什么情况会触发此异常,然后如何避免此异常的发生呢?


ipc.RpcServer: Unexpected throwable object java.lang.IllegalArgumentException: In CellChunkMap, cell must be associated with chunk

2023-01-27 Thread leojie
hi all,
再向社区求助一个RS宕机隐患,HBase版本是2.2.6,看了源码,理论上也会在最新HBase版本中出现,具体的异常栈如下:

2023-01-24 19:14:45,414 ERROR
[RpcServer.default.RWQ.Fifo.read.handler=92,queue=11,port=60020] ipc.RpcServer:
Unexpected throwable object
java.lang.IllegalArgumentException: In CellChunkMap, cell must be
associated with chunk.. We were looking for a cell at index 5
at
org.apache.hadoop.hbase.regionserver.CellChunkMap.getCell(CellChunkMap.java:109)
at
org.apache.hadoop.hbase.regionserver.CellFlatMap.find(CellFlatMap.java:87)
at
org.apache.hadoop.hbase.regionserver.CellFlatMap.getValidIndex(CellFlatMap.java:114)
at
org.apache.hadoop.hbase.regionserver.CellFlatMap.tailMap(CellFlatMap.java:184)
at
org.apache.hadoop.hbase.regionserver.CellFlatMap.tailMap(CellFlatMap.java:45)
at
org.apache.hadoop.hbase.regionserver.CellSet.tailSet(CellSet.java:150)
at
org.apache.hadoop.hbase.regionserver.CellSet.tailSet(CellSet.java:145)
at
org.apache.hadoop.hbase.regionserver.Segment.tailSet(Segment.java:414)
at
org.apache.hadoop.hbase.regionserver.SegmentScanner.getIterator(SegmentScanner.java:131)
at
org.apache.hadoop.hbase.regionserver.SegmentScanner.reseek(SegmentScanner.java:156)
at
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:324)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:267)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:1099)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:1088)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:823)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:730)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:157)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6681)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6845)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6615)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:6592)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:6579)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2645)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2571)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42274)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:379)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)



2023-01-24 19:14:45,419 WARN  [main-BucketCacheWriter-1] bucket.BucketCache:
Failed allocation for 5740f58a86a14107afaab310bf2444cb_0;
org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
Allocation too big size=4493298; adjust BucketCache sizes
hbase.bucketcache.bucket.sizes to accomodate if size seems reasonable and
you want it cached.
2023-01-24 19:14:45,420 WARN  [main-BucketCacheWriter-1] bucket.BucketCache:
Failed allocation for b1aa2b31cc8a4a23bb5d6bc3d43df51b_0;
org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
Allocation too big size=3260552; adjust BucketCache sizes
hbase.bucketcache.bucket.sizes to accomodate if size seems reasonable and
you want it cached.
2023-01-24 19:14:45,424 WARN  [main-BucketCacheWriter-1] bucket.BucketCache:
Failed allocation for e20825c52020440c9eaa1abc29e3516b_0;
org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
Allocation too big size=3266144; adjust BucketCache sizes
hbase.bucketcache.bucket.sizes to accomodate if size seems reasonable and
you want it cached.
2023-01-24 19:14:45,426 WARN  [main-BucketCacheWriter-1] bucket.BucketCache:
Failed allocation for 591bb35d28d64eaebc30079f8f83b529_0;
org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
Allocation too big size=3261225; adjust BucketCache sizes
hbase.bucketcache.bucket.sizes to accomodate if size seems reasonable and
you want it cached.
2023-01-24 19:14:45,430 WARN  [main-BucketCacheWriter-2] bucket.BucketCache:
Failed allocation for a94eeb8e9e9f4cc590d3c6cd2118c48d_0;
org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException:
Allocation too big size=3260054; adjust BucketCache sizes
hbase.bucketcache.bucket.sizes to accomodate if size seems reasonable and
you want it cached.
2023-01-24 19:14:45,436 WARN  [main-BucketCacheWriter-1] 

scan 引发未捕获运行时异常导致RS宕机

2023-01-27 Thread leojie
Hi all,
向社区求助一个HBase的scan异常导致RS宕机的问题,异常日志如下:

2023-01-19 03:19:06,986 ERROR
[RpcServer.default.RWQ.Fifo.scan.handler=226,queue=19,port=60020]
ipc.RpcServer: Unexpected throwable object
java.lang.RuntimeException: Cached block contents differ, which should not
have happened.cacheKey:bbec4ed53b6d475cbb8711f183556eb0_14145152
at
org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:205)
at
org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:237)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:432)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:417)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:403)
at
org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:68)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1528)
at java.util.Optional.ifPresent(Optional.java:159)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1526)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:928)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1061)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1055)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1073)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1094)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:351)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:244)
at
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:324)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:267)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:1099)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:1088)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:823)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:730)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:157)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6681)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6845)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6615)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3238)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3483)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:379)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

对应的源码(2.2.6):

public static int validateBlockAddition(Cacheable existing, Cacheable newBlock,
  BlockCacheKey cacheKey) {
  int comparison = compareCacheBlock(existing, newBlock, false);
  if (comparison != 0) {
throw new RuntimeException(
  "Cached block contents differ, which should not have happened."
+ "cacheKey:" + cacheKey);
  }
  if ((existing instanceof HFileBlock) && (newBlock instanceof HFileBlock)) {
comparison = ((HFileBlock) existing).getNextBlockOnDiskSize()
  - ((HFileBlock) newBlock).getNextBlockOnDiskSize();
  }
  return comparison;
}


scan操作的缓存置换过程中,触发了一个未捕获的RunTime异常,导致RS突然宕机,请教一下,什么情况会触发此异常,然后如何避免此异常的发生呢?