[
https://issues.apache.org/jira/browse/KUDU-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330753#comment-15330753
]
Dan Burkert commented on KUDU-1486:
-----------------------------------
Dug into the core file and wasn't able to see how this was possible. The
tserver is processing a transaction with 16 operations, all updates. The last
operation is completely corrupted. Below is a printout of the first and last.
Looking at the op decoding code, I can't figure out how this is possible unless
there is some memory scribbling happening. The 16th op's pointer address looks
plausible, but the type is completely wrong.
{code}
(gdb) up
#7 0x00000000008a8a3e in kudu::tablet::Tablet::AcquireRowLocks
(this=this@entry=0x118ecdc0, tx_state=0x19820360) at
../../src/kudu/tablet/tablet.cc:334
334 in ../../src/kudu/tablet/tablet.cc
(gdb) print tx_state->row_ops_
$15 = std::vector of length 16, capacity 16 = {0x261d04d0, 0x261d0850,
0x261d1f80, 0x261d0380, 0x261d1b90, 0x261d0460, 0x261d01c0, 0x261d10a0,
0x261d1420, 0x261d0770, 0x261d1730,
0x261d0fc0, 0x261d0e00, 0x261d05b0, 0x261d1570, 0x261d25c0}
(gdb) p *((kudu::tablet::RowOp *) 0x261d04d0)
$16 = {decoded_op = {type = kudu::RowOperationsPB_Type_UPDATE, row_data =
0x3f75fc00
"\004\367\301\373@\020Gڧ\270H\235=Y\227\363\001\002\t/\034;\v\037\020\222Y\003\t\020",
isset_bitmap = 0x257aae0 <strings::internal::SubstituteArg::NoArg> "",
changelist = {encoded_data_ = {data_ = 0x3f75fc10
"\001\002\t/\034;\v\037\020\222Y\003\t\020",
size_ = 21}}, split_row = std::shared_ptr (empty) 0x0}, key_probe =
{impl_ = {data_ = {<kudu::DefaultDeleter<kudu::tablet::RowSetKeyProbe>> = {<No
data fields>},
ptr = 0x127bf0b0}}}, row_lock = {manager_ = 0x118ecfa8, acquired_ =
true, entry_ = 0x67cc43f0, ls_ = kudu::tablet::LockManager::LOCK_ACQUIRED},
result = {impl_ = {
data_ = {<kudu::DefaultDeleter<kudu::tablet::OperationResultPB>> = {<No
data fields>}, ptr = 0x0}}}, orig_result_from_log_ = 0x0}
(gdb) p *((kudu::tablet::RowOp *) 0x261d25c0)
$17 = {decoded_op = {type = 2265776995, row_data = 0xf27878537e280a81 <error:
Cannot access memory at address 0xf27878537e280a81>,
isset_bitmap = 0x538a060f459a0d03 <error: Cannot access memory at address
0x538a060f459a0d03>, changelist = {encoded_data_ = {
data_ = 0x390e6d72ef0e4eb <error: Cannot access memory at address
0x390e6d72ef0e4eb>, Python Exception <type 'exceptions.TypeError'> %d format: a
number is required, not gdb.Value:
size_ = 6446945457235008269}}, split_row = }, key_probe = {impl_ = {
data_ = {<kudu::DefaultDeleter<kudu::tablet::RowSetKeyProbe>> = {<No data
fields>}, ptr = 0xe20d035f747b410e}}}, row_lock = {manager_ =
0x9e9f8278cb28c438, acquired_ = 230,
entry_ = 0x900f257aaf8e1812, ls_ = (unknown: 3900457436)}, result = {impl_
= {data_ = {<kudu::DefaultDeleter<kudu::tablet::OperationResultPB>> = {<No data
fields>},
ptr = 0x1fc98894fccb5764}}}, orig_result_from_log_ = 0x130e02050ab881c7}
{code}
> Segfault in KeyEncoder
> ----------------------
>
> Key: KUDU-1486
> URL: https://issues.apache.org/jira/browse/KUDU-1486
> Project: Kudu
> Issue Type: Bug
> Components: tablet
> Affects Versions: 0.9.0
> Reporter: Mike Percy
>
> Found this crasher while running ITBLL
> {code}
> Program terminated with signal 11, Segmentation fault.
> #0 Encode (key=0xf27878537e280a81, is_last=false, dst=0x7f2d55a15598) at
> ../../src/kudu/common/key_encoder.h:84
> 84 ../../src/kudu/common/key_encoder.h: No such file or directory.
> in ../../src/kudu/common/key_encoder.h
> (gdb) bt
> #0 Encode (key=0xf27878537e280a81, is_last=false, dst=0x7f2d55a15598) at
> ../../src/kudu/common/key_encoder.h:84
> #1 kudu::KeyEncoderTraits<(kudu::DataType)7, kudu::faststring,
> void>::EncodeWithSeparators (key=0xf27878537e280a81, is_last=false,
> dst=0x7f2d55a15598) at ../../src/kudu/common/key_encoder.h:91
> #2 0x00000000017ed916 in Encode (row=...) at
> ../../src/kudu/common/key_encoder.h:311
> #3 AddColumnKey (row=...) at ../../src/kudu/common/encoded_key.cc:155
> #4 kudu::EncodedKey::FromContiguousRow (row=...) at
> ../../src/kudu/common/encoded_key.cc:46
> #5 0x00000000008a870e in RowSetKeyProbe (this=0x118ecdc0,
> tx_state=0x19820360, op=0x261d25c0) at ../../src/kudu/tablet/rowset.h:179
> #6 kudu::tablet::Tablet::AcquireLockForOp (this=0x118ecdc0,
> tx_state=0x19820360, op=0x261d25c0) at ../../src/kudu/tablet/tablet.cc:358
> #7 0x00000000008a8a3e in kudu::tablet::Tablet::AcquireRowLocks
> (this=0x118ecdc0, tx_state=0x19820360) at ../../src/kudu/tablet/tablet.cc:334
> #8 0x00000000008e234f in kudu::tablet::WriteTransaction::Prepare
> (this=0x13ad1710) at
> ../../src/kudu/tablet/transactions/write_transaction.cc:95
> #9 0x00000000008d95c8 in kudu::tablet::TransactionDriver::PrepareAndStart
> (this=0x1f8bcc60) at
> ../../src/kudu/tablet/transactions/transaction_driver.cc:170
> #10 0x00000000008d9ead in
> kudu::tablet::TransactionDriver::PrepareAndStartTask (this=0x1f8bcc60) at
> ../../src/kudu/tablet/transactions/transaction_driver.cc:159
> #11 0x000000000191a74e in operator() (this=0x5157080, permanent=false) at
> /usr/include/boost/function/function_template.hpp:1013
> #12 Run (this=0x5157080, permanent=false) at
> ../../src/kudu/util/threadpool.cc:48
> #13 kudu::ThreadPool::DispatchThread (this=0x5157080, permanent=false) at
> ../../src/kudu/util/threadpool.cc:343
> #14 0x00000000019154fa in operator() (arg=0x5367ad0) at
> /usr/include/boost/function/function_template.hpp:1013
> #15 kudu::Thread::SuperviseThread (arg=0x5367ad0) at
> ../../src/kudu/util/thread.cc:586
> #16 0x000000305f4079d1 in start_thread () from /lib64/libpthread.so.0
> #17 0x000000305f0e88fd in clone () from /lib64/libc.so.6
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)