[jira] [Updated] (KUDU-2295) nullptr dereference while scanning on already shutdown tablet replica
[ https://issues.apache.org/jira/browse/KUDU-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-2295: Code Review: http://gerrit.cloudera.org:8080/9350 > nullptr dereference while scanning on already shutdown tablet replica > - > > Key: KUDU-2295 > URL: https://issues.apache.org/jira/browse/KUDU-2295 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.7.0 >Reporter: Alexey Serbin >Assignee: Alexey Serbin >Priority: Major > > While running the \{{raft_consensus_stress-itest}}, I find one of tablet > servers crashed with the following stack trace: > {noformat} > > *** Aborted at 1518480865 (unix time) try "date -d @1518480865" if you are > using GNU date *** > PC: @ 0x7f1e02025790 scoped_refptr<>::operator->() > > *** SIGSEGV (@0x160) received by PID 8782 (TID 0x7f1de3c7e700) from PID 352; > stack trace: *** > @ 0x7f1dfdcfc330 (unknown) at ??:0 > > @ 0x7f1e02025790 scoped_refptr<>::operator->() at ??:0 > > @ 0x7f1e00ae62e7 kudu::tablet::Tablet::GetTabletAncientHistoryMark() > at ??:0 > @ 0x7f1e00ae627d kudu::tablet::Tablet::GetHistoryGcOpts() at ??:0 > > @ 0x7f1e02012c53 kudu::tserver::(anonymous > namespace)::VerifyNotAncientHistory() at ??:0 > @ 0x7f1e0201223b > kudu::tserver::TabletServiceImpl::HandleScanAtSnapshot() at ??:0 > @ 0x7f1e0200c6dd > kudu::tserver::TabletServiceImpl::HandleNewScanRequest() at ??:0 > @ 0x7f1e02009d33 kudu::tserver::TabletServiceImpl::Scan() at ??:0 > > @ 0x7f1dfc90de4d > kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_5::operator()() > at ??:0 > @ 0x7f1dfc90dc92 std::_Function_handler<>::_M_invoke() at ??:0 > > @ 0x7f1dfba728ab std::function<>::operator()() at ??:0 > > @ 0x7f1dfba7216d kudu::rpc::GeneratedServiceIf::Handle() at ??:0 > > @ 0x7f1dfba74526 kudu::rpc::ServicePool::RunThread() at ??:0 > > @ 0x7f1dfba76ad9 boost::_mfi::mf0<>::operator()() at ??:0 > > @ 0x7f1dfba76a40 boost::_bi::list1<>::operator()<>() at ??:0 > > @ 0x7f1dfba769ea boost::_bi::bind_t<>::operator()() at ??:0 > > @ 0x7f1dfba767cd > boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 > @ 0x7f1dfba190f8 boost::function0<>::operator()() at ??:0 > > @ 0x7f1df9d1788d kudu::Thread::SuperviseThread() at ??:0 > > @ 0x7f1dfdcf4184 start_thread at ??:0 > > @ 0x7f1df6023ffd clone at ??:0 > > @ 0x0 (unknown){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2303) Add KuduSchema::ToString implementation
Grant Henke created KUDU-2303: - Summary: Add KuduSchema::ToString implementation Key: KUDU-2303 URL: https://issues.apache.org/jira/browse/KUDU-2303 Project: Kudu Issue Type: Improvement Components: client Affects Versions: 1.6.0 Reporter: Grant Henke Adding a ToString method to KuduSchema and likely KuduColumnSchema would be useful for users to print schema information while debugging or logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KUDU-65) TS should remember master UUID and refuse to re-register to different cluster
[ https://issues.apache.org/jira/browse/KUDU-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned KUDU-65: --- Assignee: (was: Todd Lipcon) > TS should remember master UUID and refuse to re-register to different cluster > - > > Key: KUDU-65 > URL: https://issues.apache.org/jira/browse/KUDU-65 > Project: Kudu > Issue Type: Improvement > Components: master, tserver >Affects Versions: M5 >Reporter: Todd Lipcon >Priority: Major > > prevent accidental dataloss if the tserver is pointed at the wrong cluster's > master -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2302) Leader crashes if it can't resolve DNS address of a peer
Todd Lipcon created KUDU-2302: - Summary: Leader crashes if it can't resolve DNS address of a peer Key: KUDU-2302 URL: https://issues.apache.org/jira/browse/KUDU-2302 Project: Kudu Issue Type: Bug Components: consensus Affects Versions: 1.6.0 Reporter: Todd Lipcon In BecomeLeader we call: {code} CHECK_OK(BecomeLeaderUnlocked()); {code} This will fail if it fails to resolve the address of one of its peers. Instead it should probably continue to be leader but consider attempts to RPC to that peer to be failed due to network resolution (with periodic retries of resolution) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2301) Add metrics per connection to the reactor metrics
Sailesh Mukil created KUDU-2301: --- Summary: Add metrics per connection to the reactor metrics Key: KUDU-2301 URL: https://issues.apache.org/jira/browse/KUDU-2301 Project: Kudu Issue Type: Task Components: rpc Reporter: Sailesh Mukil Assignee: Sailesh Mukil We can expose metrics on a per connection level and store them in ReactorMetrics. As an initial step, we can expose the OutboundTransfer queue size and a rolling average of transfer speeds in Kbps. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KUDU-2300) Partition schema doesn't show correct type of bounds for range partitions
[ https://issues.apache.org/jira/browse/KUDU-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366013#comment-16366013 ] Grant Henke edited comment on KUDU-2300 at 2/15/18 6:26 PM: Edit. It looks like Dan and I were typing at the same time. I think that Kudu internally can optimize it's ranges into slightly different ranges but still correct ranges. For example kudu might increment by the smallest unit and convert < to <=. I agree this could be confusing because it doesn't match exactly the same syntax as when created, but it is in fact the same logic. I need to look deeper to validate that is in-fact what is happening. In your example what values/command did you use to create the range partition? was (Author: granthenke): I think that Kudu internally can optimize it's ranges into slightly different ranges but still correct ranges. For example kudu might increment by the smallest unit and convert < to <=. I agree this could be confusing because it doesn't match exactly the same syntax as when created, but it is in fact the same logic. I need to look deeper to validate that is in-fact what is happening. In your example what values/command did you use to create the range partition? > Partition schema doesn't show correct type of bounds for range partitions > - > > Key: KUDU-2300 > URL: https://issues.apache.org/jira/browse/KUDU-2300 > Project: Kudu > Issue Type: Bug > Components: master >Affects Versions: 1.5.0 >Reporter: Andre Araujo >Priority: Major > > The Partition Schema section of the master Web UI always show the range > partition with an {{EXCLUSIVE}} upper bound and an {{INCLUSIVE}} lower > bounce, regardless of what the actual bounds' types are. > For example, the partition below was created with two {{INCLUSIVE}} bounds, > but the upper bound is shown incorrectly: > {code:java} > HASH (CALLING_NUMBER_INT, CALLED_NUMBER_INT) PARTITIONS 2, > RANGE (PERIOD_START_TIME) ( > PARTITION 2018-02-15T00:00:00.01Z <= VALUES < > 2018-02-16T00:00:00.00Z > ){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2300) Partition schema doesn't show correct type of bounds for range partitions
[ https://issues.apache.org/jira/browse/KUDU-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366013#comment-16366013 ] Grant Henke commented on KUDU-2300: --- I think that Kudu internally can optimize it's ranges into slightly different ranges but still correct ranges. For example kudu might increment by the smallest unit and convert < to <=. I agree this could be confusing because it doesn't match exactly the same syntax as when created, but it is in fact the same logic. I need to look deeper to validate that is in-fact what is happening. In your example what values/command did you use to create the range partition? > Partition schema doesn't show correct type of bounds for range partitions > - > > Key: KUDU-2300 > URL: https://issues.apache.org/jira/browse/KUDU-2300 > Project: Kudu > Issue Type: Bug > Components: master >Affects Versions: 1.5.0 >Reporter: Andre Araujo >Priority: Major > > The Partition Schema section of the master Web UI always show the range > partition with an {{EXCLUSIVE}} upper bound and an {{INCLUSIVE}} lower > bounce, regardless of what the actual bounds' types are. > For example, the partition below was created with two {{INCLUSIVE}} bounds, > but the upper bound is shown incorrectly: > {code:java} > HASH (CALLING_NUMBER_INT, CALLED_NUMBER_INT) PARTITIONS 2, > RANGE (PERIOD_START_TIME) ( > PARTITION 2018-02-15T00:00:00.01Z <= VALUES < > 2018-02-16T00:00:00.00Z > ){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2300) Partition schema doesn't show correct type of bounds for range partitions
[ https://issues.apache.org/jira/browse/KUDU-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365997#comment-16365997 ] Dan Burkert commented on KUDU-2300: --- Hi [~asdaraujo], very early during table creation all range bounds are converted to [inclusive, exclusive). Kudu does that by incrementing the upper bound, if it's inclusive. How was the table created, and how was the partition specified? > Partition schema doesn't show correct type of bounds for range partitions > - > > Key: KUDU-2300 > URL: https://issues.apache.org/jira/browse/KUDU-2300 > Project: Kudu > Issue Type: Bug > Components: master >Affects Versions: 1.5.0 >Reporter: Andre Araujo >Priority: Major > > The Partition Schema section of the master Web UI always show the range > partition with an {{EXCLUSIVE}} upper bound and an {{INCLUSIVE}} lower > bounce, regardless of what the actual bounds' types are. > For example, the partition below was created with two {{INCLUSIVE}} bounds, > but the upper bound is shown incorrectly: > {code:java} > HASH (CALLING_NUMBER_INT, CALLED_NUMBER_INT) PARTITIONS 2, > RANGE (PERIOD_START_TIME) ( > PARTITION 2018-02-15T00:00:00.01Z <= VALUES < > 2018-02-16T00:00:00.00Z > ){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KUDU-2275) SIGSEGV due to bug in libunwind
[ https://issues.apache.org/jira/browse/KUDU-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned KUDU-2275: - Assignee: Todd Lipcon > SIGSEGV due to bug in libunwind > --- > > Key: KUDU-2275 > URL: https://issues.apache.org/jira/browse/KUDU-2275 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Will Berkeley >Assignee: Todd Lipcon >Priority: Major > > Rarely, the kernel stack watchdog can cause a segfault due to a bug in > libunwind. > {noformat} > *** Aborted at 1516180006 (unix time) try "date -d @1516180006" if you are > using GNU date *** > PC: @ 0x8c94b4 (unknown) > *** SIGSEGV (@0x7f27173e) received by PID 22279 (TID 0x7f270f87f700) from > PID 389939200; stack trace: ***{noformat} > From a core file (produced from the minidump), the backtrace is > {noformat} > #0 access_mem (as=, addr=139805870391296, val=0x7f270f87bcc0, > write=, arg=) > at > /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Ginit.c:173 > #1 0x008c8e02 in is_plt_entry (c=0x7f270f87c0e0) at > /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Gstep.c:43 > #2 _ULx86_64_step (cursor=0x7f270f87c0e0) at > /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Gstep.c:125 > #3 0x008c412d in google::GetStackTrace > (result=result@entry=0x292c0c8, max_depth=max_depth@entry=16, skip_count=0, > skip_count@entry=2) > at > /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/glog-0.3.5/src/stacktrace_libunwind-inl.h:78 > #4 0x01a9be8c in Collect (skip_frames=2, this=0x292c0c0) at > /usr/src/debug/kudu-1.5.0-cdh5.13.1/src/kudu/util/debug-util.cc:350 > #5 kudu::(anonymous namespace)::HandleStackTraceSignal (signum= out>) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/src/kudu/util/debug-util.cc:176 > #6 0x7f2716854670 in _quicksort () from ./lib64/libc.so.6 > #7 0x in ?? (){noformat} > Note that addr = 139805870391296 = 0x7f27173e. > The segfault happens because libunwind is accessing invalid memory it's > supposed to have validated: > {code:java} > /* validate address */ > const struct cursor *c = (const struct cursor *)arg; > if (likely (c != NULL) && unlikely (c->validate) > && unlikely (validate_mem (addr))) > return -1; > *val = *(unw_word_t *) addr;{code} > [Others|https://lists.nongnu.org/archive/html/libunwind-devel/2016-09/msg1.html] > have seen this same problem before. > There's also a fix for this issue in commit > 836c91c43d7a996028aa7e8d1f53630a6b8e7cbe. It's not in any release of > libunwind yet, so we could do one of the following > # upgrade libunwind to 1.2 (most recent release) and patch in the fix > # upgrade to a snapshot containing the fix > To workaround, one can set --hung_task_check_interval_ms to a large value > like 2^30, so the stack watchdog runs very rarely (although the flag is a > 32-bit signed integer, so not too big). The tradeoff is the effective loss of > the stack watchdog, which can make debugging certain performance problems > more difficult. -- This message was sent by Atlassian JIRA (v7.6.3#76005)