[
https://issues.apache.org/jira/browse/PHOENIX-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tanuj Khurana updated PHOENIX-7611:
-----------------------------------
Summary: Memory corruption issue in Phoenix coprocessors in HBase 2 (was:
Memory corruption issue in Phoenix coprocessors af HBase 2)
> Memory corruption issue in Phoenix coprocessors in HBase 2
> ----------------------------------------------------------
>
> Key: PHOENIX-7611
> URL: https://issues.apache.org/jira/browse/PHOENIX-7611
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.0.0, 5.1.0, 5.1.1, 5.2.0, 5.1.2, 5.1.3, 5.2.1
> Reporter: Tanuj Khurana
> Priority: Major
>
> The memory corruption has surfaced in the form of segmentation faults which
> crashes the Regionserver. We have observed this in production in our
> environment as well as in ITs. We already have PHOENIX-7419 open for it. I
> was also hitting this issue when working on PHOENIX-7591 There sometimes the
> test would fail with a FATAL error message of SIGSEGV. But more often the
> test would fail with a silent corruption. After adding more logging, what I
> found was that some of the Cell references we were storing in
> IndexRegionObserver were getting corrupted.
> I started looking around in HBase for similar corruptions and found that from
> HBase 2 onwards the contract with the coprocessor for preBatchMutate hook
> says:
> *Do not retain references to any Cells in Mutations* beyond the life of this
> invocation. If need a Cell reference for later use, copy the cell and use
> that
> IndexRegionObserver maintains the row state in the memory as a Put mutation
> which references to the Cells in the Mutation to handle concurrent updates
> and the lifetime of these references exceeds the invocation of the hook. It
> seems in some cases these cells can be backed by off-heap memory which can be
> reclaimed or reused causing corruptions.
> This also lines up with the stack trace attached to PHOENIX-7419
> ([^hs_err_pid783375.log)]
> {code:java}
> v ~StubRoutines::jbyte_disjoint_arraycopy
> J 23481 C2
> org.apache.hadoop.hbase.unsafe.HBasePlatformDependent.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V
> (22 bytes) @ 0x00007fb765360c32 [0x00007fb765360be0+0x52]
> j
> org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+56
> j
> org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+105
> j
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+65
> j
> org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+56
> J 24630 C2
> org.apache.phoenix.coprocessor.GlobalIndexRegionScanner.apply(Lorg/apache/hadoop/hbase/client/Put;Lorg/apache/hadoop/hbase/client/Put;)V
> (167 bytes) @ 0x00007fb7656262e0 [0x00007fb765625ca0+0x640]
> J 24258 C1
> org.apache.phoenix.hbase.index.IndexRegionObserver.applyPendingPutMutations(Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;Lorg/apache/phoenix/hbase/index/IndexRegionObserver$BatchMutateContext;J)V
> (430 bytes) @ 0x00007fb7654ac234 [0x00007fb7654aa880+0x19b4]
> j
> org.apache.phoenix.hbase.index.IndexRegionObserver.prepareDataRowStates(Lorg/apache/hadoop/hbase/coprocessor/ObserverContext;Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;Lorg/apache/phoenix/hbase/index/IndexRegionObserver$BatchMutateContext;J)V+30
> J 25543 C1
> org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutateWithExceptions(Lorg/apache/hadoop/hbase/coprocessor/ObserverContext;Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;)V
> (1004 bytes) @ 0x00007fb764ef1ffc [0x00007fb764eef7c0+0x283c]
> J 25542 C1
> org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutate(Lorg/apache/hadoop/hbase/coprocessor/ObserverContext;Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;)V
> (76 bytes) @ 0x00007fb762d1dc24 [0x00007fb762d1db00+0x124]
> J 22752 C1
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$28.call(Ljava/lang/Object;)V
> (17 bytes) @ 0x00007fb7629b21d4 [0x00007fb7629b1f00+0x2d4]
> J 14450 C2
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver()V
> (70 bytes) @ 0x00007fb762483240 [0x00007fb7624830c0+0x180]
> J 18110 C2
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(Lorg/apache/hadoop/hbase/coprocessor/CoprocessorHost$ObserverOperation;)Z
> (274 bytes) @ 0x00007fb76463c74c [0x00007fb76463c320+0x42c]
> J 23033 C1
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;)V
> (42 bytes) @ 0x00007fb762b39dcc [0x00007fb762b39640+0x78c]
> J 14181 C1
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.prepareMiniBatchOperations(Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;JLjava/util/List;)V
> (105 bytes) @ 0x00007fb763a21b3c [0x00007fb763a21380+0x7bc]
> J 14199 C1
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(Lorg/apache/hadoop/hbase/regionserver/HRegion$BatchOperation;)V
> (970 bytes) @ 0x00007fb763a37a94 [0x00007fb763a36f20+0xb74]
> J 13124 C1
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(Lorg/apache/hadoop/hbase/regionserver/HRegion$BatchOperation;)[Lorg/apache/hadoop/hbase/regionserver/OperationStatus;
> (354 bytes) @ 0x00007fb7636a5a64 [0x00007fb7636a5320+0x744] {code}
> This contract actually applies to all the methods in the RegionObserver
> contract and was updated in HBASE-15735 introduced in HBase 2. Phoenix has
> several coprocessors which implement the RegionObserver interface. We need to
> investigate all such implementations and fix them if they are holding on to
> cell references after the invocation of the hook API.
> Two patterns I have seen are:
> 1. We directly store the reference to the Cell or in a collection like
> List<Cell>
> 2. We store indirectly like in a Mutation object.
> It seems this is only a problem if we store references to Cells which extend
> the ByteBufferKeyValue which extends the ByteBufferExtendedCell since then
> can be backed by off-heap memory.
> KeyValue instances seem fine (the ones returned by
> [GenericKeyValueBuilder.java|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/hbase/index/util/GenericKeyValueBuilder.java])
--
This message was sent by Atlassian Jira
(v8.20.10#820010)