[ 
https://issues.apache.org/jira/browse/KUDU-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359480#comment-17359480
 ] 

ASF subversion and git services commented on KUDU-3291:
-------------------------------------------------------

Commit 3a1ce304b3762166ba069767b842a42ea9af1009 in kudu's branch 
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3a1ce30 ]

KUDU-3291: properly disambiguate between deltas of a row with the same timestamp

In performing a diff scan, Kudu iterates in small batches of rows,
selecting deltas associated with each row that are relevant to the
scan's timestamp bounds. Once all selected deltas are collected for a
given row, the oldest and newest deltas are found by a sorting criteria
meant to sort deltas by their application order:

1. Deltas of lower timestamps are less than deltas of higher timestmaps.
2. UNDO deltas are less than REDO deltas.
3. Within each delta store's iterator, a counter of selected deltas is
   used to disambiguate between rows that have the same timestamp. A
   critical assumption here is that this disambiguator is only used for
   deltas of the same delta store. For REDOs, a lower counter implies
   a lower application order -- the opposite is true for UNDOs.

What the above criteria don't account for is the fact that certain
iterators can iterate over separate delta stores that have deltas of the
same timestamp and type. If Kudu delta flushes while applying a large
batch of updates to the same row, the result is that some of the batch's
updates can land in the newly flushed REDO delta file, while the rest
land in the new DMS. In iterating over these stores with a
DeltaIteratorMerger, which combines the deltas of several delta stores,
this breaks the assumption described above, resulting in the crash
reported in KUDU-3291.

To remediate this, in iterators that merge multiple delta stores,
namely, the DeltaIteratorMerger, a single top-level counter is used to
guide the disambiguators generated by each sub-iterator. Before
iterating over a new batch of deltas in a given sub-iterator, this
counter is propagated to the sub-iterators as the new starting point of
its counter. The result is that the disambiguators generated by the
DeltaIteratorMerger can be used to define a total ordering of the deltas
selected.

Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55
Reviewed-on: http://gerrit.cloudera.org:8080/17547
Reviewed-by: Alexey Serbin <aser...@cloudera.com>
Tested-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Grant Henke <granthe...@apache.org>


> Crash when performing a diff scan after delta flush races with a batch of ops 
> that update the same row
> ------------------------------------------------------------------------------------------------------
>
>                 Key: KUDU-3291
>                 URL: https://issues.apache.org/jira/browse/KUDU-3291
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0
>            Reporter: Andrew Wong
>            Assignee: Andrew Wong
>            Priority: Critical
>
> It's possible to run into the following crash:
> {code:java}
> F0604 23:20:50.032124 35483072 delta_store.h:153] Check failed: 
> a.delta_store_id == b.delta_store_id (4445773336 vs. 4445771896)
> *** Check failure stack trace: ***
> *** Aborted at 1622874050 (unix time) try "date -d @1622874050" if you are 
> using GNU date ***
> PC: @     0x7fff724b033a __pthread_kill
> *** SIGABRT (@0x7fff724b033a) received by PID 69138 (TID 0x1021d6dc0) stack 
> trace: ***
>     @     0x7fff725615fd _sigtramp
>     @     0x7ffeef948568 (unknown)
>     @     0x7fff72437808 abort
>     @        0x107920599 google::logging_fail()
>     @        0x10791f4cf google::LogMessage::SendToLog()
>     @        0x10791fb95 google::LogMessage::Flush()
>     @        0x107923c9f google::LogMessageFatal::~LogMessageFatal()
>     @        0x107920b29 google::LogMessageFatal::~LogMessageFatal()
>     @        0x1009ae07e 
> kudu::tablet::SelectedDeltas::DeltaLessThanFunctor::operator()()
>     @        0x1009aa561 std::__1::max<>()
>     @        0x10099c740 kudu::tablet::SelectedDeltas::ProcessDelta()
>     @        0x10099e719 kudu::tablet::SelectedDeltas::MergeFrom()
>     @        0x1009a2b30 kudu::tablet::DeltaPreparer<>::SelectDeltas()
>     @        0x10094a545 kudu::tablet::DeltaFileIterator<>::SelectDeltas()
>     @        0x10098b10c kudu::tablet::DeltaIteratorMerger::SelectDeltas()
>     @        0x10097133f 
> kudu::tablet::DeltaApplier::InitializeSelectionVector()
>     @        0x1056df4fb kudu::MaterializingIterator::MaterializeBlock()
>     @        0x1056df2d8 kudu::MaterializingIterator::NextBlock()
>     @        0x1056d1c5b kudu::MergeIterState::PullNextBlock()
>     @        0x1056d5e62 kudu::MergeIterator::RefillHotHeap()
>     @        0x1056d4f0b kudu::MergeIterator::Init()
>     @        0x1006a413d kudu::tablet::Tablet::Iterator::Init()
>     @        0x1002cb3b9 
> kudu::tablet::DiffScanTest_TestDiffScanAfterDeltaFlush_Test::TestBody()
>     @        0x1005f1b88 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @        0x1005f1add testing::Test::Run()
>     @        0x1005f2dd0 testing::TestInfo::Run()
>     @        0x1005f3807 testing::TestSuite::Run()
>     @        0x100601b57 testing::internal::UnitTestImpl::RunAllTests()
>     @        0x100601418 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @        0x10060139c testing::UnitTest::Run()
>     @        0x100476201 RUN_ALL_TESTS()
>     @        0x100475fa8 main
> {code}
> The [crash 
> line|https://github.com/apache/kudu/blob/e574903ace741a531c49aba15f97e856ea80ca4b/src/kudu/tablet/delta_store.h#L149]
>  assumes that all deltas for a given row that have the same timestamp belong 
> in the same delta store, and it uses this assumption to order the deltas in a 
> diff scan.
> However, this is not true because, unlike the case for MRS flushes, we don't 
> wait for all ops to finish applying before flushing the DMS. This means that 
> a batch containing multiple updates to the same row may be spread across 
> multiple DMSs if we delta flush while the batch of updates is being applied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to