[jira] [Updated] (KUDU-972) cache should track memory overhead

2018-03-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated KUDU-972:
-
   Resolution: Fixed
Fix Version/s: 1.8.0
   Status: Resolved  (was: In Review)

> cache should track memory overhead
> --
>
> Key: KUDU-972
> URL: https://issues.apache.org/jira/browse/KUDU-972
> Project: Kudu
>  Issue Type: Bug
>  Components: cfile, util
>Affects Versions: Private Beta
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Fix For: 1.8.0
>
>
> Currently the cache only accounts for the cache _values_ in the memtracker. 
> Each key seems to have 88 or so bytes of memory usage as well (potentially 
> rounded up due to allocation overhead, etc).
> For the DRAM cache, this is still a ~700:1 ratio assuming 64kb block sizes, 
> but for PMEM, where we expect hundreds of GBs of block cache, a 700:1 ratio 
> may turn out to be somewhat substantial.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2368) Add ability to configure the number of reactors in KuduClient

2018-03-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated KUDU-2368:
--
Labels: newbie  (was: )

> Add ability to configure the number of reactors in KuduClient
> -
>
> Key: KUDU-2368
> URL: https://issues.apache.org/jira/browse/KUDU-2368
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: newbie
>
> Currently it seems that we just use the default (4) and don't allow any 
> configuration ability when building the client. This can limit throughput 
> when a client is used by a multi-threaded application (eg 'kudu perf loadgen')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2370) Allow accessing consensus metadata during flush/sync

2018-03-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410244#comment-16410244
 ] 

Todd Lipcon commented on KUDU-2370:
---

As for the impact of this issue, I'm looking through logs on a 120-node cluster 
which is under some load and found cases like this:

- 19/20 ConsensusService threads are blocked on RaftConsensus::lock_
-- 1 thread in RaftConsensus::UpdateReplica waiting to acquire 'lock_'
-- 4 in RaftConsensus::Update() waiting to acquire update_lock_
-- 11 in RequestVote trying to acquire lock_
-- 1 in StartElection
-- 3 in DoElectionCallback

The only thread not blocked is in this stack:
{code}
0x337a80f7e0 
0x337a80ba5e 
   0x1c25269 kudu::ConditionVariable::WaitUntil()
0xaf280f kudu::consensus::RaftConsensus::UpdateReplica()
0xaf3ff7 kudu::consensus::RaftConsensus::Update()
0x8bd6a9 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
   0x1b5bf3d kudu::rpc::GeneratedServiceIf::Handle()
   0x1b5cc4f kudu::rpc::ServicePool::RunThread()
   0x1cc0ef1 kudu::Thread::SuperviseThread()
{code}

The AppendThread is itself just waiting on slow IO:
{code}
W0322 01:59:12.722179 78194 kernel_stack_watchdog.cc:191] Thread 190634 stuck 
at ../../src/kudu/consensus/log.cc:664 for 2922ms:
Kernel stack:
[] do_get_write_access+0x29d/0x520 [jbd2]
[] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
[] __ext4_journal_get_write_access+0x38/0x80 [ext4]
[] ext4_reserve_inode_write+0x73/0xa0 [ext4]
[] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
[] ext4_dirty_inode+0x40/0x60 [ext4]
[] __mark_inode_dirty+0x3b/0x160
[] file_update_time+0xf2/0x170
[] __generic_file_aio_write+0x230/0x490
[] generic_file_aio_write+0x88/0x100
[] ext4_file_write+0x58/0x190 [ext4]
[] do_sync_readv_writev+0xfb/0x140
[] do_readv_writev+0xd6/0x1f0
[] vfs_writev+0x46/0x60
[] sys_pwritev+0xa2/0xc0
[] system_call_fastpath+0x16/0x1b
[] 0x
{code}

So, it seems like despite having 20 threads, they are all blocked on work on 
this one tablet, and then this causes a bunch of pre-elections even on idle 
tablets.

> Allow accessing consensus metadata during flush/sync
> 
>
> Key: KUDU-2370
> URL: https://issues.apache.org/jira/browse/KUDU-2370
> Project: Kudu
>  Issue Type: Improvement
>  Components: consensus, perf
>Affects Versions: 1.8.0
>Reporter: Todd Lipcon
>Priority: Major
>
> In some cases when disks are overloaded or starting to go bad, flushing 
> consensus metadata can take a significant amount of time. Currently, we hold 
> the RaftConsensus::lock_ for the duration of things like voting or changing 
> term, which blocks other requests such as writes or UpdateConsensus calls. 
> There are certainly some cases where exposing "dirty" (non-durable) cmeta is 
> illegal from a Raft perspectives, but there are other cases where it is safe. 
> For example:
> - assume we receive a Write request, and we see that cmeta is currently busy 
> flushing a change that marks the local replica as a FOLLOWER. In that case, 
> if we wait on the lock, when we eventually acquire it, we'll just reject the 
> request anyway. We might as well reject it immediately.
> - Assume we receive a Write request, and we see that cmeta is currently 
> flushing a change that will mark the local replica as a LEADER in the next 
> term. CheckLeadershipAndBindTerm can safely bind to the upcoming term rather 
> than blocking until the flush completes.
> - Assume we recieve an UpdateConsensus or Vote request for term N, and we see 
> that we're currently flushing a change to term M > N. I think it's safe to 
> reject the request even though the new term isn't yet durable.
> Probably a few other cases here where it's safe to act on not-yet-durable 
> info.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2370) Allow accessing consensus metadata during flush/sync

2018-03-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410213#comment-16410213
 ] 

Todd Lipcon commented on KUDU-2370:
---

Another example:
- RequestVote holds update_lock_ and lock_ while waiting on the sync of metadata
- RequestVote tries to respond quickly with a "busy" response in this case:
{code}
// There is another vote or update concurrent with the vote. In that case, 
that
// other request is likely to reset the timer, and we'll end up just voting
// "NO" after waiting. To avoid starving RPC handlers and causing cascading
// timeouts, just vote a quick NO.
//
// We still need to take the state lock in order to respond with term info, 
etc.
ThreadRestrictions::AssertWaitAllowed();
LockGuard l(lock_);
return RequestVoteRespondIsBusy(request, response);
{code}

However the LockGuard there ends up just waiting until the other vote is done, 
defeating the purpose of the quick response. In this case we are acquiring the 
lock just to get the current term, but in the case of rejecting a vote it would 
be fine to respond with an optimistic (not-yet-durable) term.

> Allow accessing consensus metadata during flush/sync
> 
>
> Key: KUDU-2370
> URL: https://issues.apache.org/jira/browse/KUDU-2370
> Project: Kudu
>  Issue Type: Improvement
>  Components: consensus, perf
>Affects Versions: 1.8.0
>Reporter: Todd Lipcon
>Priority: Major
>
> In some cases when disks are overloaded or starting to go bad, flushing 
> consensus metadata can take a significant amount of time. Currently, we hold 
> the RaftConsensus::lock_ for the duration of things like voting or changing 
> term, which blocks other requests such as writes or UpdateConsensus calls. 
> There are certainly some cases where exposing "dirty" (non-durable) cmeta is 
> illegal from a Raft perspectives, but there are other cases where it is safe. 
> For example:
> - assume we receive a Write request, and we see that cmeta is currently busy 
> flushing a change that marks the local replica as a FOLLOWER. In that case, 
> if we wait on the lock, when we eventually acquire it, we'll just reject the 
> request anyway. We might as well reject it immediately.
> - Assume we receive a Write request, and we see that cmeta is currently 
> flushing a change that will mark the local replica as a LEADER in the next 
> term. CheckLeadershipAndBindTerm can safely bind to the upcoming term rather 
> than blocking until the flush completes.
> - Assume we recieve an UpdateConsensus or Vote request for term N, and we see 
> that we're currently flushing a change to term M > N. I think it's safe to 
> reject the request even though the new term isn't yet durable.
> Probably a few other cases here where it's safe to act on not-yet-durable 
> info.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2370) Allow accessing consensus metadata during flush/sync

2018-03-22 Thread Todd Lipcon (JIRA)
Todd Lipcon created KUDU-2370:
-

 Summary: Allow accessing consensus metadata during flush/sync
 Key: KUDU-2370
 URL: https://issues.apache.org/jira/browse/KUDU-2370
 Project: Kudu
  Issue Type: Improvement
  Components: consensus, perf
Affects Versions: 1.8.0
Reporter: Todd Lipcon


In some cases when disks are overloaded or starting to go bad, flushing 
consensus metadata can take a significant amount of time. Currently, we hold 
the RaftConsensus::lock_ for the duration of things like voting or changing 
term, which blocks other requests such as writes or UpdateConsensus calls. 
There are certainly some cases where exposing "dirty" (non-durable) cmeta is 
illegal from a Raft perspectives, but there are other cases where it is safe. 
For example:

- assume we receive a Write request, and we see that cmeta is currently busy 
flushing a change that marks the local replica as a FOLLOWER. In that case, if 
we wait on the lock, when we eventually acquire it, we'll just reject the 
request anyway. We might as well reject it immediately.
- Assume we receive a Write request, and we see that cmeta is currently 
flushing a change that will mark the local replica as a LEADER in the next 
term. CheckLeadershipAndBindTerm can safely bind to the upcoming term rather 
than blocking until the flush completes.
- Assume we recieve an UpdateConsensus or Vote request for term N, and we see 
that we're currently flushing a change to term M > N. I think it's safe to 
reject the request even though the new term isn't yet durable.

Probably a few other cases here where it's safe to act on not-yet-durable info.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2369) Flakiness in LinkedListTest.TestLoadAndVerify

2018-03-22 Thread Alexey Serbin (JIRA)
Alexey Serbin created KUDU-2369:
---

 Summary: Flakiness in LinkedListTest.TestLoadAndVerify 
 Key: KUDU-2369
 URL: https://issues.apache.org/jira/browse/KUDU-2369
 Project: Kudu
  Issue Type: Bug
  Components: test
Affects Versions: 1.6.0, 1.7.0
Reporter: Alexey Serbin


When running the {{LinkedListTest.TestLoadAndVerify}} (RELEASE build) with 
{{--stress_cpu_threads=}} the test frequently fails 
with the error like below:

{noformat}
I0321 16:00:56.718511 114661 client-test-util.cc:57] Op UPDATE int64 
rand_key=2499810302766219279, bool updated=true had status Timed out: Failed to 
write batch of 64730 ops to tablet 317d67e94fc14d3bb3bdb519dab1c7fc after 1 
attempt(s): Failed to write to server: 6556c5903ec64ac694f65a2fa32852c0 
(127.111.117.131:48060): Write RPC to 127.111.117.131:48060 timed out after 
14.989s (SENT)
F0321 16:00:56.718530 114661 client-test-util.cc:61] Check failed: _s.ok() Bad 
status: IO error: Some errors occurred
*** Check failure stack trace: ***
@   0x8fbd0d  google::LogMessage::Fail()
@   0x8fdbcd  google::LogMessage::SendToLog()
@   0x8fb849  google::LogMessage::Flush()
@   0x8fe66f  google::LogMessageFatal::~LogMessageFatal()
@   0x99bb35  kudu::client::LogSessionErrorsAndDie()
@   0x8cbcfd  kudu::ScopedRowUpdater::RowUpdaterThread()
@  0x1e57086  kudu::Thread::SuperviseThread()
@   0x3ae0e079d1  (unknown)
@   0x3ae0ae88fd  (unknown)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2075) Crash when using tracing in SetupThreadLocalBuffer

2018-03-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-2075.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

> Crash when using tracing in SetupThreadLocalBuffer
> --
>
> Key: KUDU-2075
> URL: https://issues.apache.org/jira/browse/KUDU-2075
> Project: Kudu
>  Issue Type: Bug
>  Components: util
>Affects Versions: 1.4.0
>Reporter: Jean-Daniel Cryans
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: 1.8.0
>
>
> Got this crash while tracing:
> {noformat}
> F0721 13:14:12.038748  2708 map-util.h:414] Check failed: 
> InsertIfNotPresent(collection, key, data) duplicate key: 139914842822400
> {noformat}
> Backtrace:
> {noformat}
> #0  0x00348aa32625 in raise () from /lib64/libc.so.6
> #1  0x00348aa33e05 in abort () from /lib64/libc.so.6
> #2  0x01b53d29 in kudu::AbortFailureFunction () at 
> /usr/src/debug/kudu-1.4.0-cdh5.12.0/src/kudu/util/minidump.cc:186
> #3  0x008b9e1d in google::LogMessage::Fail () at 
> /usr/src/debug/kudu-1.4.0-cdh5.12.0/thirdparty/src/glog-0.3.5/src/logging.cc:1488
> #4  0x008bbcdd in google::LogMessage::SendToLog (this=Unhandled dwarf 
> expression opcode 0xf3
> ) at 
> /usr/src/debug/kudu-1.4.0-cdh5.12.0/thirdparty/src/glog-0.3.5/src/logging.cc:1442
> #5  0x008b9959 in google::LogMessage::Flush (this=0x7f40768144f0) at 
> /usr/src/debug/kudu-1.4.0-cdh5.12.0/thirdparty/src/glog-0.3.5/src/logging.cc:1311
> #6  0x008bc77f in google::LogMessageFatal::~LogMessageFatal 
> (this=0x7f40768144f0, __in_chrg=)
> at 
> /usr/src/debug/kudu-1.4.0-cdh5.12.0/thirdparty/src/glog-0.3.5/src/logging.cc:2023
> #7  0x01b0915f in InsertOrDie kudu::debug::TraceLog::PerThreadInfo*> > (collection=0x36265a8, key=Unhandled 
> dwarf expression opcode 0xf3
> )
> at /usr/src/debug/kudu-1.4.0-cdh5.12.0/src/kudu/gutil/map-util.h:414
> #8  0x01b00c18 in kudu::debug::TraceLog::SetupThreadLocalBuffer 
> (this=0x3626300) at 
> /usr/src/debug/kudu-1.4.0-cdh5.12.0/src/kudu/util/debug/trace_event_impl.cc:1715
> #9  0x01b052d8 in 
> kudu::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp (this=0x3626300, 
> phase=Unhandled dwarf expression opcode 0xf3
> )
> at 
> /usr/src/debug/kudu-1.4.0-cdh5.12.0/src/kudu/util/debug/trace_event_impl.cc:1773
> #10 0x00ab616d in AddTraceEventWithThreadIdAndTimestamp long> (this=0x59dd5e40, entry_batches=std::vector of length 1, capacity 1 = 
> {...})
> at 
> /usr/src/debug/kudu-1.4.0-cdh5.12.0/src/kudu/util/debug/trace_event.h:1315
> #11 AddTraceEvent (this=0x59dd5e40, entry_batches=std::vector 
> of length 1, capacity 1 = {...})
> at 
> /usr/src/debug/kudu-1.4.0-cdh5.12.0/src/kudu/util/debug/trace_event.h:1331
> #12 kudu::log::Log::AppendThread::HandleGroup (this=0x59dd5e40, 
> entry_batches=std::vector of length 1, capacity 1 = {...})
> at /usr/src/debug/kudu-1.4.0-cdh5.12.0/src/kudu/consensus/log.cc:335
> #13 0x00ab6707 in kudu::log::Log::AppendThread::DoWork 
> (this=0x59dd5e40) at 
> /usr/src/debug/kudu-1.4.0-cdh5.12.0/src/kudu/consensus/log.cc:326
> #14 0x01b8d7d6 in operator() (this=0x21c2a180, permanent=false)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2018-03-22 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1641#comment-1641
 ] 

Mostafa Mokhtar commented on KUDU-2086:
---

Higher number of reactor threads and reduced tcmalloc contention in the reactor 
thread code path alleviated the issue.

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Major
> Attachments: krpc_hash_test.c
>
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2109) TabletCopyClientSessionITest.TestCopyFromCrashedSource is flaky

2018-03-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned KUDU-2109:
-

Assignee: Todd Lipcon

> TabletCopyClientSessionITest.TestCopyFromCrashedSource is flaky
> ---
>
> Key: KUDU-2109
> URL: https://issues.apache.org/jira/browse/KUDU-2109
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.4.0
>Reporter: Adar Dembo
>Assignee: Todd Lipcon
>Priority: Major
> Attachments: 0_tablet_copy_client_session-itest.txt
>
>
> I've attached the full log from my test failure.
> I think I've found the issue too: the test assumes that if it finds an 
> on-disk superblock in the TOMBSTONED state, the failed tablet copy has 
> finished and it's safe to start another one. However, in 
> TSTabletManager::RunTabletCopy, 'tc_client' goes out of scope before 
> 'deleter', which means that the TabletCopyClient destructor (which deletes 
> the on-disk data, flushing the superblock in the TOMBSTONED state) will run 
> before the TransitionInProgress destructor (which removes the tablet's ID 
> from the global map tracking transitions, allowing a new tablet copy to 
> proceed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2361) MasterTest.TestDumpStacksOnRpcQueueOverflow fails in TSAN mode with stress

2018-03-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned KUDU-2361:
-

Assignee: Todd Lipcon

> MasterTest.TestDumpStacksOnRpcQueueOverflow fails in TSAN mode with stress
> --
>
> Key: KUDU-2361
> URL: https://issues.apache.org/jira/browse/KUDU-2361
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.7.0
>Reporter: Adar Dembo
>Assignee: Todd Lipcon
>Priority: Major
>
> I noticed this in a full test run in TSAN mode on Ubuntu 16, and was able to 
> repro by passing --stress_cpu_threads=8 (I have 8 cores on my machine) 
> locally.
> Test output:
> {noformat}
> Note: Google Test filter = *TestDumpStacksOnRpcQueueOverflow*
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from MasterTest
> [ RUN  ] MasterTest.TestDumpStacksOnRpcQueueOverflow
> I0320 12:28:58.121057   439 system_ntp.cc:143] NTP initialized. Skew: 500ppm 
> Current error: 73757us
> I0320 12:28:58.121695   439 fs_manager.cc:260] Metadata directory not provided
> I0320 12:28:58.121868   439 fs_manager.cc:266] Using write-ahead log 
> directory (fs_wal_dir) as metadata directory
> I0320 12:28:58.122126   439 server_base.cc:434] Could not load existing FS 
> layout: Not found: 
> /tmp/kudutest-1000/master-test.MasterTest.TestDumpStacksOnRpcQueueOverflow.1521574137899614-439/Master/instance:
>  No such file or directory (error 2)
> I0320 12:28:58.122280   439 server_base.cc:435] Attempting to create new FS 
> layout instead
> I0320 12:28:58.139992   439 fs_manager.cc:595] Generated new instance 
> metadata in path 
> /tmp/kudutest-1000/master-test.MasterTest.TestDumpStacksOnRpcQueueOverflow.1521574137899614-439/Master/instance:
> uuid: "55f12dd184874ef1906983634a1dda96"
> format_stamp: "Formatted at 2018-03-20 19:28:58 on adar-ThinkPad-T540p"
> I0320 12:28:58.167023   439 fs_manager.cc:495] Time spent creating directory 
> manager: real 0.026s user 0.012s sys 0.012s
> I0320 12:28:58.168340   439 env_posix.cc:1624] Raising this process' open 
> files per process limit from 1024 to 1048576
> I0320 12:28:58.168772   439 file_cache.cc:470] Constructed file cache lbm 
> with capacity 419430
> I0320 12:28:58.204406   439 fs_manager.cc:417] Time spent opening block 
> manager: real 0.019s  user 0.000s sys 0.008s
> I0320 12:28:58.204677   439 fs_manager.cc:428] Opened local filesystem: 
> /tmp/kudutest-1000/master-test.MasterTest.TestDumpStacksOnRpcQueueOverflow.1521574137899614-439/Master
> uuid: "55f12dd184874ef1906983634a1dda96"
> format_stamp: "Formatted at 2018-03-20 19:28:58 on adar-ThinkPad-T540p"
> I0320 12:28:58.205045   439 fs_report.cc:347] Block manager report
> 
> 1 data directories: 
> /tmp/kudutest-1000/master-test.MasterTest.TestDumpStacksOnRpcQueueOverflow.1521574137899614-439/Master/data
> Total live blocks: 0
> Total live bytes: 0
> Total live bytes (after alignment): 0
> Total number of LBM containers: 0 (0 full)
> Did not check for missing blocks
> Did not check for orphaned blocks
> Total full LBM containers with extra space: 0 (0 repaired)
> Total full LBM container extra space in bytes: 0 (0 repaired)
> Total incomplete LBM containers: 0 (0 repaired)
> Total LBM partial records: 0 (0 repaired)
> W0320 12:28:59.004899   439 thread.cc:559] rpc reactor (reactor) Time spent 
> creating pthread: real 0.687s user 0.000s sys 0.008s
> W0320 12:28:59.005123   439 thread.cc:526] rpc reactor (reactor) Time spent 
> starting thread: real 0.687s  user 0.000s sys 0.008s
> I0320 12:28:59.151173   439 env_posix.cc:1629] Not raising this process' 
> running threads per effective uid limit of 63233; it is already as high as it 
> can go
> I0320 12:28:59.174991   467 process_memory.cc:182] Process hard memory limit 
> is 12.441025 GB
> I0320 12:28:59.175251   467 process_memory.cc:184] Process soft memory limit 
> is 9.952820 GB
> I0320 12:28:59.175415   467 process_memory.cc:187] Process memory pressure 
> threshold is 7.464615 GB
> I0320 12:28:59.831009   439 rpc_server.cc:201] RPC server started. Bound to: 
> 127.0.0.1:44269
> I0320 12:28:59.947118   439 webserver.cc:173] Starting webserver on 
> 127.0.0.1:0
> I0320 12:28:59.947260   439 webserver.cc:184] Document root disabled
> I0320 12:28:59.970778   439 webserver.cc:311] Webserver started. Bound to: 
> http://127.0.0.1:35093/
> I0320 12:28:59.984550   513 data_dirs.cc:931] Could only allocate 1 dirs of 
> requested 3 for tablet . 1 dirs total, 0 dirs 
> full, 0 dirs failed
> I0320 12:29:00.052474   513 tablet_bootstrap.cc:436] T 
>  P 55f12dd184874ef1906983634a1dda96: 
> Bootstrap starting.
> 

[jira] [Resolved] (KUDU-2356) Idle WALs can consume significant memory

2018-03-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-2356.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

> Idle WALs can consume significant memory
> 
>
> Key: KUDU-2356
> URL: https://issues.apache.org/jira/browse/KUDU-2356
> Project: Kudu
>  Issue Type: Improvement
>  Components: log, tserver
>Affects Versions: 1.7.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Fix For: 1.8.0
>
> Attachments: heap.svg
>
>
> I grabbed a heap sample of a tserver which has been running a write workload 
> for a little while and found that 750MB of memory is used by faststring 
> allocations inside WritableLogSegment::WriteEntryBatch. It seems like this is 
> the 'compress_buf_' member. This buffer always resizes up during a log write 
> but never shrinks back down, even when the WAL is idle. We should consider 
> clearing the buffer after each append, or perhaps after a short timeout like 
> 100ms after a WAL becomes idle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2018-03-22 Thread Joe McDonnell (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409830#comment-16409830
 ] 

Joe McDonnell commented on KUDU-2086:
-

[~tlipcon] Good point, I changed this to an Improvement and dropped the 
priority.

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Major
> Attachments: krpc_hash_test.c
>
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2018-03-22 Thread Joe McDonnell (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated KUDU-2086:

Priority: Major  (was: Critical)

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Major
> Attachments: krpc_hash_test.c
>
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1372) Verify cluster-wide master and tserver connectivity

2018-03-22 Thread Adar Dembo (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409810#comment-16409810
 ] 

Adar Dembo commented on KUDU-1372:
--

{quote}you still think there's something we should do here, given the various 
improvements over the last two years?
{quote}
Yes and no.

On the one hand, the original Jira was somewhat vague in what it was asking 
for, and vague Jiras are generally unproductive.

On the other hand, we've always had a steady trickle of new users who show up 
with either a client-side exception or a server-side log message and ask us to 
tell them what's going on. Often times they've deployed in a toy environment, 
maybe just one machine, maybe within VMs, maybe using /etc/hosts for name 
resolution, etc. The original Jira mentioned metrics/logs/dashboards, but I'm 
starting to think that a CLI tool would be an ideal way to deal with this. If 
we had a one-stop cluster-wide connectivity checking tool (like ksck, or 
perhaps even ksck itself), that would help supporting these cases significantly.

 

> Verify cluster-wide master and tserver connectivity
> ---
>
> Key: KUDU-1372
> URL: https://issues.apache.org/jira/browse/KUDU-1372
> Project: Kudu
>  Issue Type: Improvement
>  Components: ops-tooling, supportability
>Affects Versions: 0.7.1
>Reporter: Adar Dembo
>Priority: Major
>
> Kudu clusters need full-duplex connectivity inside of Raft configurations 
> (masters or tservers), as well as between masters and tservers themselves. No 
> doubt users will run into all sorts of issues with only partially configured 
> firewalls.
> Let's do what we can to surface connectivity failures, be they inside of a 
> Raft configuration or between a particular master and tserver. Metrics, log 
> output, and web UI dashboards are all good places to surface these failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2367) Leader replica sometimes reports follower's health status as FAILED instead of FAILED_UNRECOVERABLE

2018-03-22 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2367:

Code Review: http://gerrit.cloudera.org:8080/9755

> Leader replica sometimes reports follower's health status as FAILED instead 
> of FAILED_UNRECOVERABLE
> ---
>
> Key: KUDU-2367
> URL: https://issues.apache.org/jira/browse/KUDU-2367
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.7.0, 1.8.0
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Major
>
> If a leader tablet replica detects that its follower falls behind the WAL 
> segment GC threshold after the unavailability interval (defined by the 
> {{--follower_unavailable_considered_failed_sec}} flag), it never reports the 
> status of the follower as FAILED_UNRECOVERABLE to the catalog manager, and 
> continues reporting FAILED instead.  In configurations where the tablet 
> replication factor equals to the total number of tablet servers in the 
> cluster, that leads to situations when the tablet cannot be automatically 
> recovered for a long time.  In particular, such situations last until a new 
> leader is elected or corresponding tablet servers are restarted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2367) Leader replica sometimes reports follower's health status as FAILED instead of FAILED_UNRECOVERABLE

2018-03-22 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2367:

Status: In Review  (was: Open)

> Leader replica sometimes reports follower's health status as FAILED instead 
> of FAILED_UNRECOVERABLE
> ---
>
> Key: KUDU-2367
> URL: https://issues.apache.org/jira/browse/KUDU-2367
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.7.0, 1.8.0
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Major
>
> If a leader tablet replica detects that its follower falls behind the WAL 
> segment GC threshold after the unavailability interval (defined by the 
> {{--follower_unavailable_considered_failed_sec}} flag), it never reports the 
> status of the follower as FAILED_UNRECOVERABLE to the catalog manager, and 
> continues reporting FAILED instead.  In configurations where the tablet 
> replication factor equals to the total number of tablet servers in the 
> cluster, that leads to situations when the tablet cannot be automatically 
> recovered for a long time.  In particular, such situations last until a new 
> leader is elected or corresponding tablet servers are restarted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)