[jira] [Updated] (KUDU-2295) nullptr dereference while scanning on already shutdown tablet replica

2018-02-15 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2295:

Code Review: http://gerrit.cloudera.org:8080/9350

> nullptr dereference while scanning on already shutdown tablet replica
> -
>
> Key: KUDU-2295
> URL: https://issues.apache.org/jira/browse/KUDU-2295
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.7.0
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Major
>
> While running the \{{raft_consensus_stress-itest}}, I find one of tablet 
> servers crashed with the following stack trace:
> {noformat}
>      
> *** Aborted at 1518480865 (unix time) try "date -d @1518480865" if you are 
> using GNU date ***
> PC: @ 0x7f1e02025790 scoped_refptr<>::operator->()
>   
> *** SIGSEGV (@0x160) received by PID 8782 (TID 0x7f1de3c7e700) from PID 352; 
> stack trace: ***
>     @ 0x7f1dfdcfc330 (unknown) at ??:0
>   
>     @ 0x7f1e02025790 scoped_refptr<>::operator->() at ??:0
>   
>     @ 0x7f1e00ae62e7 kudu::tablet::Tablet::GetTabletAncientHistoryMark() 
> at ??:0
>     @ 0x7f1e00ae627d kudu::tablet::Tablet::GetHistoryGcOpts() at ??:0 
>   
>     @ 0x7f1e02012c53 kudu::tserver::(anonymous 
> namespace)::VerifyNotAncientHistory() at ??:0
>     @ 0x7f1e0201223b 
> kudu::tserver::TabletServiceImpl::HandleScanAtSnapshot() at ??:0
>     @ 0x7f1e0200c6dd 
> kudu::tserver::TabletServiceImpl::HandleNewScanRequest() at ??:0
>     @ 0x7f1e02009d33 kudu::tserver::TabletServiceImpl::Scan() at ??:0 
>   
>     @ 0x7f1dfc90de4d 
> kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_5::operator()()
>  at ??:0
>     @ 0x7f1dfc90dc92 std::_Function_handler<>::_M_invoke() at ??:0
>   
>     @ 0x7f1dfba728ab std::function<>::operator()() at ??:0
>   
>     @ 0x7f1dfba7216d kudu::rpc::GeneratedServiceIf::Handle() at ??:0  
>   
>     @ 0x7f1dfba74526 kudu::rpc::ServicePool::RunThread() at ??:0  
>   
>     @ 0x7f1dfba76ad9 boost::_mfi::mf0<>::operator()() at ??:0 
>   
>     @ 0x7f1dfba76a40 boost::_bi::list1<>::operator()<>() at ??:0  
>   
>     @ 0x7f1dfba769ea boost::_bi::bind_t<>::operator()() at ??:0   
>   
>     @ 0x7f1dfba767cd 
> boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
>     @ 0x7f1dfba190f8 boost::function0<>::operator()() at ??:0 
>   
>     @ 0x7f1df9d1788d kudu::Thread::SuperviseThread() at ??:0  
>   
>     @ 0x7f1dfdcf4184 start_thread at ??:0 
>   
>     @ 0x7f1df6023ffd clone at ??:0
>   
>     @    0x0 (unknown){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2303) Add KuduSchema::ToString implementation

2018-02-15 Thread Grant Henke (JIRA)
Grant Henke created KUDU-2303:
-

 Summary: Add KuduSchema::ToString implementation
 Key: KUDU-2303
 URL: https://issues.apache.org/jira/browse/KUDU-2303
 Project: Kudu
  Issue Type: Improvement
  Components: client
Affects Versions: 1.6.0
Reporter: Grant Henke


Adding a ToString method to KuduSchema and likely KuduColumnSchema would be 
useful for users to print schema information while debugging or logging. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-65) TS should remember master UUID and refuse to re-register to different cluster

2018-02-15 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned KUDU-65:
---

Assignee: (was: Todd Lipcon)

> TS should remember master UUID and refuse to re-register to different cluster
> -
>
> Key: KUDU-65
> URL: https://issues.apache.org/jira/browse/KUDU-65
> Project: Kudu
>  Issue Type: Improvement
>  Components: master, tserver
>Affects Versions: M5
>Reporter: Todd Lipcon
>Priority: Major
>
> prevent accidental dataloss if the tserver is pointed at the wrong cluster's 
> master



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2302) Leader crashes if it can't resolve DNS address of a peer

2018-02-15 Thread Todd Lipcon (JIRA)
Todd Lipcon created KUDU-2302:
-

 Summary: Leader crashes if it can't resolve DNS address of a peer
 Key: KUDU-2302
 URL: https://issues.apache.org/jira/browse/KUDU-2302
 Project: Kudu
  Issue Type: Bug
  Components: consensus
Affects Versions: 1.6.0
Reporter: Todd Lipcon


In BecomeLeader we call:
{code}
 CHECK_OK(BecomeLeaderUnlocked());
{code}
This will fail if it fails to resolve the address of one of its peers. Instead 
it should probably continue to be leader but consider attempts to RPC to that 
peer to be failed due to network resolution (with periodic retries of 
resolution)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2301) Add metrics per connection to the reactor metrics

2018-02-15 Thread Sailesh Mukil (JIRA)
Sailesh Mukil created KUDU-2301:
---

 Summary: Add metrics per connection to the reactor metrics
 Key: KUDU-2301
 URL: https://issues.apache.org/jira/browse/KUDU-2301
 Project: Kudu
  Issue Type: Task
  Components: rpc
Reporter: Sailesh Mukil
Assignee: Sailesh Mukil


We can expose metrics on a per connection level and store them in 
ReactorMetrics.

As an initial step, we can expose the OutboundTransfer queue size and a rolling 
average of transfer speeds in Kbps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KUDU-2300) Partition schema doesn't show correct type of bounds for range partitions

2018-02-15 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366013#comment-16366013
 ] 

Grant Henke edited comment on KUDU-2300 at 2/15/18 6:26 PM:


Edit. It looks like Dan and I were typing at the same time.

I think that Kudu internally can optimize it's ranges into slightly different 
ranges but still correct ranges. For example kudu might increment by the 
smallest unit and convert < to <=.

I agree this could be confusing because it doesn't match exactly the same 
syntax as when created, but it is in fact the same logic. 

I need to look deeper to validate that is in-fact what is happening.

In your example what values/command did you use to create the range partition?


was (Author: granthenke):
I think that Kudu internally can optimize it's ranges into slightly different 
ranges but still correct ranges. For example kudu might increment by the 
smallest unit and convert < to <=.

I agree this could be confusing because it doesn't match exactly the same 
syntax as when created, but it is in fact the same logic. 

I need to look deeper to validate that is in-fact what is happening.

In your example what values/command did you use to create the range partition?

> Partition schema doesn't show correct type of bounds for range partitions
> -
>
> Key: KUDU-2300
> URL: https://issues.apache.org/jira/browse/KUDU-2300
> Project: Kudu
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.5.0
>Reporter: Andre Araujo
>Priority: Major
>
> The Partition Schema section of the master Web UI always show the range 
> partition with an {{EXCLUSIVE}} upper bound and an {{INCLUSIVE}} lower 
> bounce, regardless of what the actual bounds' types are.
> For example, the partition below was created with two {{INCLUSIVE}} bounds, 
> but the upper bound is shown incorrectly:
> {code:java}
> HASH (CALLING_NUMBER_INT, CALLED_NUMBER_INT) PARTITIONS 2,
> RANGE (PERIOD_START_TIME) (
> PARTITION 2018-02-15T00:00:00.01Z <= VALUES < 
> 2018-02-16T00:00:00.00Z
> ){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2300) Partition schema doesn't show correct type of bounds for range partitions

2018-02-15 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366013#comment-16366013
 ] 

Grant Henke commented on KUDU-2300:
---

I think that Kudu internally can optimize it's ranges into slightly different 
ranges but still correct ranges. For example kudu might increment by the 
smallest unit and convert < to <=.

I agree this could be confusing because it doesn't match exactly the same 
syntax as when created, but it is in fact the same logic. 

I need to look deeper to validate that is in-fact what is happening.

In your example what values/command did you use to create the range partition?

> Partition schema doesn't show correct type of bounds for range partitions
> -
>
> Key: KUDU-2300
> URL: https://issues.apache.org/jira/browse/KUDU-2300
> Project: Kudu
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.5.0
>Reporter: Andre Araujo
>Priority: Major
>
> The Partition Schema section of the master Web UI always show the range 
> partition with an {{EXCLUSIVE}} upper bound and an {{INCLUSIVE}} lower 
> bounce, regardless of what the actual bounds' types are.
> For example, the partition below was created with two {{INCLUSIVE}} bounds, 
> but the upper bound is shown incorrectly:
> {code:java}
> HASH (CALLING_NUMBER_INT, CALLED_NUMBER_INT) PARTITIONS 2,
> RANGE (PERIOD_START_TIME) (
> PARTITION 2018-02-15T00:00:00.01Z <= VALUES < 
> 2018-02-16T00:00:00.00Z
> ){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2300) Partition schema doesn't show correct type of bounds for range partitions

2018-02-15 Thread Dan Burkert (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365997#comment-16365997
 ] 

Dan Burkert commented on KUDU-2300:
---

Hi [~asdaraujo], very early during table creation all range bounds are 
converted to [inclusive, exclusive).  Kudu does that by incrementing the upper 
bound, if it's inclusive.  How was the table created, and how was the partition 
specified?

> Partition schema doesn't show correct type of bounds for range partitions
> -
>
> Key: KUDU-2300
> URL: https://issues.apache.org/jira/browse/KUDU-2300
> Project: Kudu
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.5.0
>Reporter: Andre Araujo
>Priority: Major
>
> The Partition Schema section of the master Web UI always show the range 
> partition with an {{EXCLUSIVE}} upper bound and an {{INCLUSIVE}} lower 
> bounce, regardless of what the actual bounds' types are.
> For example, the partition below was created with two {{INCLUSIVE}} bounds, 
> but the upper bound is shown incorrectly:
> {code:java}
> HASH (CALLING_NUMBER_INT, CALLED_NUMBER_INT) PARTITIONS 2,
> RANGE (PERIOD_START_TIME) (
> PARTITION 2018-02-15T00:00:00.01Z <= VALUES < 
> 2018-02-16T00:00:00.00Z
> ){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2275) SIGSEGV due to bug in libunwind

2018-02-15 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned KUDU-2275:
-

Assignee: Todd Lipcon

> SIGSEGV due to bug in libunwind
> ---
>
> Key: KUDU-2275
> URL: https://issues.apache.org/jira/browse/KUDU-2275
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Will Berkeley
>Assignee: Todd Lipcon
>Priority: Major
>
> Rarely, the kernel stack watchdog can cause a segfault due to a bug in 
> libunwind.
> {noformat}
> *** Aborted at 1516180006 (unix time) try "date -d @1516180006" if you are 
> using GNU date ***
> PC: @ 0x8c94b4 (unknown)
> *** SIGSEGV (@0x7f27173e) received by PID 22279 (TID 0x7f270f87f700) from 
> PID 389939200; stack trace: ***{noformat}
> From a core file (produced from the minidump), the backtrace is
> {noformat}
> #0  access_mem (as=, addr=139805870391296, val=0x7f270f87bcc0, 
> write=, arg=)
>    at 
> /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Ginit.c:173
> #1  0x008c8e02 in is_plt_entry (c=0x7f270f87c0e0) at 
> /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Gstep.c:43
> #2  _ULx86_64_step (cursor=0x7f270f87c0e0) at 
> /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Gstep.c:125
> #3  0x008c412d in google::GetStackTrace 
> (result=result@entry=0x292c0c8, max_depth=max_depth@entry=16, skip_count=0, 
> skip_count@entry=2)
>    at 
> /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/glog-0.3.5/src/stacktrace_libunwind-inl.h:78
> #4  0x01a9be8c in Collect (skip_frames=2, this=0x292c0c0) at 
> /usr/src/debug/kudu-1.5.0-cdh5.13.1/src/kudu/util/debug-util.cc:350
> #5  kudu::(anonymous namespace)::HandleStackTraceSignal (signum= out>) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/src/kudu/util/debug-util.cc:176
> #6  0x7f2716854670 in _quicksort () from ./lib64/libc.so.6
> #7  0x in ?? (){noformat}
> Note that addr = 139805870391296 = 0x7f27173e.
> The segfault happens because libunwind is accessing invalid memory it's 
> supposed to have validated:
> {code:java}
> /* validate address */
> const struct cursor *c = (const struct cursor *)arg;
> if (likely (c != NULL) && unlikely (c->validate)
> && unlikely (validate_mem (addr)))
> return -1;
> *val = *(unw_word_t *) addr;{code}
> [Others|https://lists.nongnu.org/archive/html/libunwind-devel/2016-09/msg1.html]
>  have seen this same problem before.
> There's also a fix for this issue in commit 
> 836c91c43d7a996028aa7e8d1f53630a6b8e7cbe. It's not in any release of 
> libunwind yet, so we could do one of the following
>  # upgrade libunwind to 1.2 (most recent release) and patch in the fix
>  # upgrade to a snapshot containing the fix
> To workaround, one can set --hung_task_check_interval_ms to a large value 
> like 2^30, so the stack watchdog runs very rarely (although the flag is a 
> 32-bit signed integer, so not too big). The tradeoff is the effective loss of 
> the stack watchdog, which can make debugging certain performance problems 
> more difficult.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)