[ https://issues.apache.org/jira/browse/KUDU-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297107#comment-17297107 ]
ASF subversion and git services commented on KUDU-3254: ------------------------------------------------------- Commit 7c8dca60d15b560017ef7e726a379788727502ba in kudu's branch refs/heads/branch-1.13.x from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=7c8dca6 ] KUDU-3254 fix bug in meta-cache exposed by KUDU-1802 This patch fixes an issue resulting in a SIGABRT crash in Kudu client when working with stale scan tokens which contain information about tablet locations for a table (see KUDU-1802) whose range partition was dropped. The patch also adds a test scenario reproducing the crash; now it passes and can catch future regressions. This patch is a follow-up to d23ee5d38ddc4317f431dd65df0c825c00cc968a. Prior the change in src/kudu/client/meta_cache.cc was back-ported from Kudu 1.14 as part of this fix, the scenario crashed with SIGABRT when running with the stack trace similar to the following (this one below was captured on macOS): * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430 frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120 frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3 frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3 frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442 frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5 frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5 frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37 frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3 frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23 frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35 frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10 frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10 Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19 Reviewed-on: http://gerrit.cloudera.org:8080/17152 Tested-by: Alexey Serbin <aser...@cloudera.com> Reviewed-by: Andrew Wong <aw...@cloudera.com> > Crash in Kudu C++ client when working with stale scan tokens containing > tablet location info > -------------------------------------------------------------------------------------------- > > Key: KUDU-3254 > URL: https://issues.apache.org/jira/browse/KUDU-3254 > Project: Kudu > Issue Type: Bug > Components: client > Affects Versions: 1.13.0 > Reporter: Alexey Serbin > Assignee: Alexey Serbin > Priority: Major > Fix For: 1.14.0 > > > With KUDU-1802 implemented, the meta-cache in Kudu C++ client might crash if > using a scan token with information on tablet location in scenarios like > below: > # Scan tokens were generated for table with multiple ranges (e.g., with two > ranges: [-100, 0), [0, 100)). > # First range was dropped (e.g., range [-100, 0) is dropped). > # A client was fed a set of tokens generated at step 1 to read from the table > (now with one stale token corresponding to the dropped range). > # The same client instance was used to write into the table. > # The same client instance fed the original set of tokens once more to read > from the table again. > The client would crash at step 5 of the sequence above. > The stack trace on crash might look like this (captured on macOS): > {noformat} > * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + > 10 > frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + > 430 > frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120 > frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() > at logging.cc:1474:3 > frame #4: 0x000000010ca19121 > libglog.0.dylib`google::LogMessage::SendToLog() [inlined] > google::LogMessage::Fail() at logging.cc: > 1488:3 > frame #5: 0x000000010ca1911b > libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442 > frame #6: 0x000000010ca19815 > libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5 > frame #7: 0x000000010ca1d76f > libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at > logging.cc:2023:5 > frame #8: 0x000000010ca1a5f9 > libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at > logging.cc:2022:37 > frame #9: 0x0000000103e365e3 > libkudu_client.dylib`std::__1::map<std::__ > 1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, kudu::client::internal::MetaCacheEntry, > std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, > std::__1::allocator<char> > >, > std::__1::allocator<std::__1::pair<std::__1::basic_string<char, > std::__1::char_traits<char>, std::__1::allocator<char> > const, > kudu::client::internal::MetaCacheEntry> > >::mapped_type& > FindOrDie<std::__1::map<std::__1::basic_string<char, > std::__1::char_traits<char>, std::__1::allocator<char> >, > kudu::client::internal::MetaCacheEntry, > std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, > std::__1::allocator<char> > >, > std::__1::allocator<std::__1::pair<std::__1::basic_string<char, > std::__1::char_traits<char>, std::__1::allocator<char> > const, > kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3 > frame #10: 0x0000000103e34cbb > libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() > at meta_cache.cc:943:23 > frame #11: 0x0000000103e86166 > libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at > scan_token-internal.cc:192:35 > frame #12: 0x0000000103e88051 > libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() > at scan_token-internal.cc:111:10 > frame #13: 0x0000000103d55d3c > libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at > client.cc:1879:10 > {noformat} > The issue is fixed in Kudu 1.14 with [this > changelist|https://github.com/apache/kudu/commit/2a558768f8aa00068e72ccd1327081f07ba46b03]. -- This message was sent by Atlassian Jira (v8.3.4#803005)