[kudu-CR] exactly once rpc-test: fix gc stress test flakiness

2016-12-05 Thread Todd Lipcon (Code Review)
Hello David Ribeiro Alves,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5358

to review the following change.

Change subject: exactly_once_rpc-test: fix gc stress test flakiness
..

exactly_once_rpc-test: fix gc stress test flakiness

This test involves two threads:

1) the 'stubborn writer' thread retries a request with the same sequence
number over and over. It expects that eventually the cached result will
go stale, and then later that the client will be entirely GCed and thus
the request will start to succeed again.

2) the 'long write' thread, which uses the normal RetriableRpc mechanism
to send requests, each with increasing sequence numbers. We expect that,
since each of these requests is a new one, and isn't retried once it's
successful, we won't see any 'stale' responses.

The test was flaky, however, because the 'stubborn writer' thread was
always sending its own sequence number as the last_incomplete sequence
number, and we also didn't ensure that it started before the 'long
write' thread. Given that, it was possible to have this interleaving:

  1) start the 'long write' thread, which is assigned seq number 1
  2) before the write is sent, the 'stubborn writer' thread assigns
 itself seq number 2, and sends a request indicating last_incomplete=2.
  3) when the 'long write' thread sends its request, it immediately gets
 a 'stale' response, causing a test failure.

One fix would have been to make the 'stubborn writer' thread send the
first_incomplete calculated by the RequestTracker. However, that would
have involved modifying a bunch of other tests to properly update the
RequestTracker.

So instead this test takes the approach of assigning the 'stubborn
writer's sequence number before starting the 'long writer' thread. This
ensures that the 'stubborn writer' won't explicitly GC any request made
by the 'long writer'.

With the patch, I looped this test 500 times and it passed[1]. Without
the patch, it failed 64/500[2].

[1] http://dist-test.cloudera.org//job?job_id=todd.1480926593.3793
[2] http://dist-test.cloudera.org//job?job_id=todd.1480926999.4126

Change-Id: I30a7d06928973964c5285e5e86503e5871ea5995
---
M src/kudu/rpc/exactly_once_rpc-test.cc
1 file changed, 23 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/5358/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5358
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I30a7d06928973964c5285e5e86503e5871ea5995
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 


[kudu-CR] KUDU-1622. result tracker: respond to RPCs outside of the lock

2016-12-05 Thread Todd Lipcon (Code Review)
Hello David Ribeiro Alves,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5359

to review the following change.

Change subject: KUDU-1622. result_tracker: respond to RPCs outside of the lock
..

KUDU-1622. result_tracker: respond to RPCs outside of the lock

This patch started with the goal of addressing the flakiness of
ExactlyOnceRpcTest.TestExactlyOnceSemanticsAfterRpcCompleted. The issue
was that the MemTracker updates resulting from an RPC response were
occurring asynchronously after the RPC was responded, and in two
separate increments. Thus, in this test, there was one scenario in which
the test would record a partial update and then in a later assertion not
match the earlier recorded value.

In order to fix this, I ended up at a solution which both fixed the
flakiness and reduced lock contention in ResultTracker. We now respond
to RPCs after we've done the required MemTracker updates, which also
means that we respond to RPCs after dropping the ResultTracker locks.
This has been a high contention point in stress tests, so this is a nice
benefit.

Fixing the flakiness also meant that other 'AssertEventually'
workarounds from this test case could now be removed.

I looped ExactlyOnceRpcTest.TestExactlyOnceSemanticsAfterRpcCompleted
500 times with 4 stress threads and it passed 100% of the time[1]. Prior
to this patch, it would reliably fail at least a couple runs out of
500[2].

I also looped exactly_once_writes-itest 500 times with this patch
with 100% success[3].

[1] http://dist-test.cloudera.org//job?job_id=todd.1480928777.29435
[2] http://dist-test.cloudera.org//job?job_id=todd.1480929041.29742
[3] http://dist-test.cloudera.org//job?job_id=todd.1480929367.31010

Change-Id: I065b9e40bc1af81e0871220a0a01461ea35143b5
---
M src/kudu/rpc/exactly_once_rpc-test.cc
M src/kudu/rpc/result_tracker.cc
2 files changed, 92 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/5359/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5359
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I065b9e40bc1af81e0871220a0a01461ea35143b5
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 


[kudu-CR] KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

2016-12-05 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5300

to look at the new patch set (#11).

Change subject: KUDU-798 (part 4) Add a TimeManager to manage safe time 
advancement
..

KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

This patch adds an entity to manage safe time (the timestamp before
which all transactions are committed/aborted or in-flight) across
a consensus configuration. This allows scans to make sure/wait that
the scan timestamp is safe, thus making sure the scan is repeatable.

This adds a small unit test. Follow up patches will use this
in integration tests.

Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
---
M src/kudu/consensus/CMakeLists.txt
A src/kudu/consensus/time_manager-test.cc
A src/kudu/consensus/time_manager.cc
A src/kudu/consensus/time_manager.h
4 files changed, 642 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/5300/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

2016-12-05 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5300

to look at the new patch set (#12).

Change subject: KUDU-798 (part 4) Add a TimeManager to manage safe time 
advancement
..

KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

This patch adds an entity to manage safe time (the timestamp before
which all transactions are committed/aborted or in-flight) across
a consensus configuration. This allows scans to make sure/wait that
the scan timestamp is safe, thus making sure the scan is repeatable.

This adds a small unit test. Follow up patches will use this
in integration tests.

Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
---
M src/kudu/consensus/CMakeLists.txt
A src/kudu/consensus/time_manager-test.cc
A src/kudu/consensus/time_manager.cc
A src/kudu/consensus/time_manager.h
4 files changed, 633 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/5300/12
-- 
To view, visit http://gerrit.cloudera.org:8080/5300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

2016-12-05 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5300

to look at the new patch set (#13).

Change subject: KUDU-798 (part 4) Add a TimeManager to manage safe time 
advancement
..

KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

This patch adds an entity to manage safe time (the timestamp before
which all transactions are committed/aborted or in-flight) across
a consensus configuration. This allows scans to make sure/wait that
the scan timestamp is safe, thus making sure the scan is repeatable.

This adds a small unit test. Follow up patches will use this
in integration tests.

Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
---
M src/kudu/consensus/CMakeLists.txt
A src/kudu/consensus/time_manager-test.cc
A src/kudu/consensus/time_manager.cc
A src/kudu/consensus/time_manager.h
4 files changed, 634 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/5300/13
-- 
To view, visit http://gerrit.cloudera.org:8080/5300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] exactly once rpc-test: fix gc stress test flakiness

2016-12-05 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: exactly_once_rpc-test: fix gc stress test flakiness
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5358
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30a7d06928973964c5285e5e86503e5871ea5995
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

2016-12-05 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5300

to look at the new patch set (#14).

Change subject: KUDU-798 (part 4) Add a TimeManager to manage safe time 
advancement
..

KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

This patch adds an entity to manage safe time (the timestamp before
which all transactions are committed/aborted or in-flight) across
a consensus configuration. This allows scans to make sure/wait that
the scan timestamp is safe, thus making sure the scan is repeatable.

This adds a small unit test. Follow up patches will use this
in integration tests.

Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
---
M src/kudu/consensus/CMakeLists.txt
A src/kudu/consensus/time_manager-test.cc
A src/kudu/consensus/time_manager.cc
A src/kudu/consensus/time_manager.h
4 files changed, 638 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/5300/14
-- 
To view, visit http://gerrit.cloudera.org:8080/5300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Tightening ScanSpec primary bounds when range predicate exists

2016-12-05 Thread Haijie Hong (Code Review)
Haijie Hong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5360

Change subject: Tightening ScanSpec primary bounds when range predicate exists
..

Tightening ScanSpec primary bounds when range predicate exists

Currently, push upper bound key predicates will break when meeting
a range predicate.
I think the upper bound key can be decreased.
For example, If PK = (a, b) and a < 2 && b < 3, PK can be further
tightning to PK < (1, 3).

Change-Id: I931a617605434700352d285fc2039033cf9eb07e
---
M src/kudu/common/key_util.cc
M src/kudu/common/key_util.h
2 files changed, 104 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/5360/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5360
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I931a617605434700352d285fc2039033cf9eb07e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong 


[kudu-CR] KUDU-1757: fix appendCellValueDebugString, do not throw exception when a column is not set

2016-12-05 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change.

Change subject: KUDU-1757: fix appendCellValueDebugString, do not throw 
exception when a column is not set
..


Patch Set 2: Code-Review+2

I like the version where we don't add "not set" better.

-- 
To view, visit http://gerrit.cloudera.org:8080/5237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05af5a11e95751b3cc259e786d36494aeb5d3a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Yanlong Zheng 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Yanlong Zheng 
Gerrit-HasComments: No


[kudu-CR] WIP workaround for KUDU-1524

2016-12-05 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has abandoned this change.

Change subject: WIP workaround for KUDU-1524
..


Abandoned

Todd wrote a new workaround.

-- 
To view, visit http://gerrit.cloudera.org:8080/3621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I20bf94f019966ac9a860a744ab97a5f37df5f19e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] Tightening ScanSpec primary bounds when range predicate exists

2016-12-05 Thread Haijie Hong (Code Review)
Haijie Hong has abandoned this change.

Change subject: Tightening ScanSpec primary bounds when range predicate exists
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/5360
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I931a617605434700352d285fc2039033cf9eb07e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1622. result tracker: respond to RPCs outside of the lock

2016-12-05 Thread Todd Lipcon (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5359

to look at the new patch set (#2).

Change subject: KUDU-1622. result_tracker: respond to RPCs outside of the lock
..

KUDU-1622. result_tracker: respond to RPCs outside of the lock

This patch started with the goal of addressing the flakiness of
ExactlyOnceRpcTest.TestExactlyOnceSemanticsAfterRpcCompleted. The issue
was that the MemTracker updates resulting from an RPC response were
occurring asynchronously after the RPC was responded, and in two
separate increments. Thus, in this test, there was one scenario in which
the test would record a partial update and then in a later assertion not
match the earlier recorded value.

In order to fix this, I ended up at a solution which both fixed the
flakiness and reduced lock contention in ResultTracker. We now respond
to RPCs after we've done the required MemTracker updates, which also
means that we respond to RPCs after dropping the ResultTracker locks.
This has been a high contention point in stress tests, so this is a nice
benefit.

Fixing the flakiness also meant that other 'AssertEventually'
workarounds from this test case could now be removed.

I looped ExactlyOnceRpcTest.TestExactlyOnceSemanticsAfterRpcCompleted
500 times with 4 stress threads and it passed 100% of the time[1]. Prior
to this patch, it would reliably fail at least a couple runs out of
500[2].

I also looped exactly_once_writes-itest 500 times with this patch
with 100% success[3].

[1] http://dist-test.cloudera.org//job?job_id=todd.1480928777.29435
[2] http://dist-test.cloudera.org//job?job_id=todd.1480929041.29742
[3] http://dist-test.cloudera.org//job?job_id=todd.1480929367.31010

Change-Id: I065b9e40bc1af81e0871220a0a01461ea35143b5
---
M src/kudu/rpc/exactly_once_rpc-test.cc
M src/kudu/rpc/result_tracker.cc
2 files changed, 93 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/5359/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5359
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I065b9e40bc1af81e0871220a0a01461ea35143b5
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

2016-12-05 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-798 (part 4) Add a TimeManager to manage safe time 
advancement
..


Patch Set 14:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/5300/14/src/kudu/consensus/time_manager-test.cc
File src/kudu/consensus/time_manager-test.cc:

PS14, Line 52:   // Waits (forever) for the TimeManager's safe time to have 
advanced to 'safe_time'.
 :   // Returns a latch that allows to wait for TimeManager to 
consider 'safe_time' safe.
 :  
the two lines of this comment seem contradictory with each other. The second 
line is correct, right? i.e this function itself doesn't wait, it just returns 
a latch that will fire when the time is safe.


Line 58:   time_manager_->WaitUntilSafe(safe_time, MonoTime::Max());
should this CHECK_OK?


Line 94:   ASSERT_OK(time_manager_->MessageReceivedFromLeader
weird wrapping


Line 101:   MonoTime after_small = MonoTime::Now();
I tihnk with recent changes we can just do:

auto deadline = MonoTime::Now() + MonoDelta::FromMilliseconds(100);


http://gerrit.cloudera.org:8080/#/c/5300/14/src/kudu/consensus/time_manager.cc
File src/kudu/consensus/time_manager.cc:

Line 107: Timestamp t(message.timestamp());
unused?


Line 170: void TimeManager::AdvanceSafeTimeAndWakeUpWaiters(Timestamp 
safe_time) {
should this be named 'Unlocked'?


PS14, Line 190: (
also unlocked?


http://gerrit.cloudera.org:8080/#/c/5300/14/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

PS14, Line 40: Because of this the read will be repeatable
this probably needs to say something like "If the scanner additionally uses the 
MvccManager to wait until the given timestamp is clean, then the read will be 
repeatable". Otherwise, the in-flight ones won't be read, but will later be 
read once they commit, right?


PS14, Line 52: sintony
typo?


Line 70:   TimeManager(scoped_refptr clock,
nit: no need for wrap


PS14, Line 95: CHECK failure if it isn't)
curious why the inconsistency that here it's a CHECK failure but above it's a 
non-OK status


PS14, Line 107:  'message's timestamp
'safe_time'


-- 
To view, visit http://gerrit.cloudera.org:8080/5300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] exactly once rpc-test: fix gc stress test flakiness

2016-12-05 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged.

Change subject: exactly_once_rpc-test: fix gc stress test flakiness
..


exactly_once_rpc-test: fix gc stress test flakiness

This test involves two threads:

1) the 'stubborn writer' thread retries a request with the same sequence
number over and over. It expects that eventually the cached result will
go stale, and then later that the client will be entirely GCed and thus
the request will start to succeed again.

2) the 'long write' thread, which uses the normal RetriableRpc mechanism
to send requests, each with increasing sequence numbers. We expect that,
since each of these requests is a new one, and isn't retried once it's
successful, we won't see any 'stale' responses.

The test was flaky, however, because the 'stubborn writer' thread was
always sending its own sequence number as the last_incomplete sequence
number, and we also didn't ensure that it started before the 'long
write' thread. Given that, it was possible to have this interleaving:

  1) start the 'long write' thread, which is assigned seq number 1
  2) before the write is sent, the 'stubborn writer' thread assigns
 itself seq number 2, and sends a request indicating last_incomplete=2.
  3) when the 'long write' thread sends its request, it immediately gets
 a 'stale' response, causing a test failure.

One fix would have been to make the 'stubborn writer' thread send the
first_incomplete calculated by the RequestTracker. However, that would
have involved modifying a bunch of other tests to properly update the
RequestTracker.

So instead this test takes the approach of assigning the 'stubborn
writer's sequence number before starting the 'long writer' thread. This
ensures that the 'stubborn writer' won't explicitly GC any request made
by the 'long writer'.

With the patch, I looped this test 500 times and it passed[1]. Without
the patch, it failed 64/500[2].

[1] http://dist-test.cloudera.org//job?job_id=todd.1480926593.3793
[2] http://dist-test.cloudera.org//job?job_id=todd.1480926999.4126

Change-Id: I30a7d06928973964c5285e5e86503e5871ea5995
Reviewed-on: http://gerrit.cloudera.org:8080/5358
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves 
---
M src/kudu/rpc/exactly_once_rpc-test.cc
1 file changed, 23 insertions(+), 12 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5358
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I30a7d06928973964c5285e5e86503e5871ea5995
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] fault injection: use exit instead of exit

2016-12-05 Thread Todd Lipcon (Code Review)
Hello Dan Burkert, David Ribeiro Alves, Adar Dembo,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5361

to review the following change.

Change subject: fault_injection: use _exit instead of exit
..

fault_injection: use _exit instead of exit

80ac8bae335b490c7b75351e6d4c321a58183c73 caused some tests to fail in
TSAN due to the atexit handlers running when a process crashed with an
injected fault.

This switches to using _exit() instead of exit(), which is equivalent
except that atexit handlers do not run.

Change-Id: I72e30bfa750ace22e1e736e258b3f5720b25a651
---
M src/kudu/util/fault_injection.cc
1 file changed, 4 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/5361/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5361
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I72e30bfa750ace22e1e736e258b3f5720b25a651
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 


[kudu-CR] [python] - Add timeouts to Python unit tests

2016-12-05 Thread Jordan Birdsell (Code Review)
Jordan Birdsell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5362

Change subject: [python] - Add timeouts to Python unit tests
..

[python] - Add timeouts to Python unit tests

This patch adds a global 100 second timeout to all python unit tests.
This is consistent with the JUnit configuration.  If we find that
individual tests need longer than 100 seconds then we can override them.
This configuration was tested in patchset 1, however the forced timeout
will not be committed to master for obvious reasons.

Change-Id: Ibd2a1b7fd7e7e8b2a8ddef9413dc7a863f1968e2
---
M python/kudu/tests/test_scantoken.py
M python/setup.cfg
M python/setup.py
3 files changed, 10 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/5362/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5362
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibd2a1b7fd7e7e8b2a8ddef9413dc7a863f1968e2
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jordan Birdsell 


[kudu-CR] KUDU-921. tablet copy: Make the StartTabletCopy() RPC async

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-921. tablet copy: Make the StartTabletCopy() RPC async
..


Patch Set 7:

> I don't have a strong opinion as long as:
 > 1) There's some way to wait for the tablet copy to finish, at least
 > for tests, even if it's super hacky.
 > 2) It's documented in your commit message that the semantics of
 > StartTabletCopy are changing.

Done

-- 
To view, visit http://gerrit.cloudera.org:8080/5045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I95c63f2bfd6762487862efbdba9cb3676112
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1785. Fix potential crash in TabletCopySourceSession

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5363

Change subject: KUDU-1785. Fix potential crash in TabletCopySourceSession
..

KUDU-1785. Fix potential crash in TabletCopySourceSession

This patch fixes a crash in TabletCopySourceSession that can result from
dereferencing a null pointer. This may occur when a follower initiates a
TabletCopy session with a source server while the source server is
bootstrapping the given tablet.

Added a stress test in tablet_copy_client_session-itest to trigger the
issue.

Change-Id: I6f3ec35885dbf1a81c23ac10b1c9556dfddbd4b7
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tserver/tablet_copy_source_session.cc
4 files changed, 129 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/5363/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6f3ec35885dbf1a81c23ac10b1c9556dfddbd4b7
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 


[kudu-CR] KUDU-921. tablet copy: Make the StartTabletCopy() RPC async

2016-12-05 Thread Mike Percy (Code Review)
Hello Dinesh Bhat, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5045

to look at the new patch set (#8).

Change subject: KUDU-921. tablet copy: Make the StartTabletCopy() RPC async
..

KUDU-921. tablet copy: Make the StartTabletCopy() RPC async

This patch changes tablet copy to execute on its own thread pool.
This clears up threads on the Consensus service pool for other tasks.

A new ThreadPool called tablet_copy_pool_ was added to TSTabletManager
to run TabletCopy operations. Its max_threads parameter is tunable with
a new gflag and it has its max_queue_size hard-coded to 0 in order to
provide backpressure when it doesn't have capacity to immediately copy
new tablets.

This patch changes the semantics of StartTabletCopy() to return as soon
as the tablet copy has started -- it no longer waits until the process
is completed to return. Clients can follow the progress of the tablet
copy process using the ListTablets() RPC call and waiting for the tablet
to be in a RUNNING state.

A test was added in tablet_copy-itest that checks that the backpressure
mechanism is working such that submitting too many StartTabletCopy()
requests at one time results in a ServiceUnavailable error.

Some additional tests were added in tablet_copy_client_session-itest to
improve test coverage of the StartTabletCopy() code path.

Change-Id: I95c63f2bfd6762487862efbdba9cb3676112
---
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/tserver/tablet_copy_service.cc
M src/kudu/tserver/tablet_peer_lookup.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
10 files changed, 348 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/5045/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95c63f2bfd6762487862efbdba9cb3676112
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [python] - Add timeouts to Python unit tests

2016-12-05 Thread Jordan Birdsell (Code Review)
Jordan Birdsell has uploaded a new patch set (#2).

Change subject: [python] - Add timeouts to Python unit tests
..

[python] - Add timeouts to Python unit tests

This patch adds a global 100 second timeout to all python unit tests.
This is consistent with the JUnit configuration.  If we find that
individual tests need longer than 100 seconds then we can override them.
This configuration was tested in patchset 2, however the forced timeout
will not be committed to master for obvious reasons.

Change-Id: Ibd2a1b7fd7e7e8b2a8ddef9413dc7a863f1968e2
---
M python/kudu/tests/test_scantoken.py
M python/setup.cfg
M python/setup.py
3 files changed, 10 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/5362/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5362
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibd2a1b7fd7e7e8b2a8ddef9413dc7a863f1968e2
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jordan Birdsell 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-1785. Fix potential crash in TabletCopySourceSession

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has uploaded a new patch set (#2).

Change subject: KUDU-1785. Fix potential crash in TabletCopySourceSession
..

KUDU-1785. Fix potential crash in TabletCopySourceSession

This patch fixes a crash in TabletCopySourceSession that can result from
dereferencing a null pointer. This may occur when a follower initiates a
TabletCopy session with a source server while the source server is
bootstrapping the given tablet.

Added a stress test in tablet_copy_client_session-itest to trigger the
issue.

Change-Id: I6f3ec35885dbf1a81c23ac10b1c9556dfddbd4b7
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tserver/tablet_copy_source_session.cc
4 files changed, 129 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/5363/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6f3ec35885dbf1a81c23ac10b1c9556dfddbd4b7
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] [consensus] KUDU-1613: Fix replica eviction failure for WRONG SERVER UUID

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: [consensus] KUDU-1613: Fix replica eviction failure for 
WRONG_SERVER_UUID
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5111/5//COMMIT_MSG
Commit Message:

Line 9: Fixes replica eviction from leader where a follower returning
Can you please explain your approach to fixing this issue in the commit message 
in detail?


http://gerrit.cloudera.org:8080/#/c/5111/5/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

Line 816: // Verify that tablet copy is triggered when peer responds with
Why do we want tablet copy to be triggered when the peer responds that it is no 
longer the expected peer?


-- 
To view, visit http://gerrit.cloudera.org:8080/5111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d3f84fd297a8e4760208a213c1ee393e92499a3
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: [consensus] KUDU-1407 replica is not evcited when 
TABLET_NOT_RUNNING
..


Patch Set 3:

(3 comments)

The JIRA (KUDU-1407) talks about evicting a failed replica, but here you are 
tablet copying a failed replica, so having some discussion about the design of 
this in the commit message would help.

http://gerrit.cloudera.org:8080/#/c/5352/3//COMMIT_MSG
Commit Message:

PS3, Line 7: evcited
evicted


Line 9: Fixes replica eviction failure where a follower returning
Please provide a detailed description of how you are fixing this problem and 
what the different cases are


http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/tserver/tablet_server_test_util.cc
File src/kudu/tserver/tablet_server_test_util.cc:

Line 34: const char* TabletServerTestBase::kTableId = "TestTable";
Shouldn't we put this in a new file tablet_server-test-base.cc ?


-- 
To view, visit http://gerrit.cloudera.org:8080/5352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-695. Avoid glog contention by deferring log writes to dedicated threads

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-695. Avoid glog contention by deferring log writes to 
dedicated threads
..


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5321
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie22a0a29fa00a988a53a15d2c726ce5d49018f4d
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-695. Avoid glog contention by deferring log writes to dedicated threads

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged.

Change subject: KUDU-695. Avoid glog contention by deferring log writes to 
dedicated threads
..


KUDU-695. Avoid glog contention by deferring log writes to dedicated threads

This patch changes the logging init sequence to install a wrapper around
the built-in glog Loggers. The wrapper starts a new thread which does
double-buffering: log entries are buffered in a vector, and the thread
is woken up. The thread swaps in a clean buffer to avoid delaying
application threads while it flushes the original buffer to disk.

It's hard to test the end-to-end integration in a unit test, since the
unit tests disable logging to files, and this path only affects
file-based logging. However, I ran an earlier version of this patch in a
stress test environment and it seemed to reduce the frequency with which
I saw threads blocked on glog.

The patch does, however, have a test which exercises the new code paths,
including the blocking path. I looped the new test 500 times in TSAN
mode with success.

The new feature is enabled by default, but I left a hidden flag to
disable it in case we have any issues.

Change-Id: Ie22a0a29fa00a988a53a15d2c726ce5d49018f4d
Reviewed-on: http://gerrit.cloudera.org:8080/5321
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/async_logger.cc
A src/kudu/util/async_logger.h
M src/kudu/util/debug/leakcheck_disabler.h
M src/kudu/util/logging-test.cc
M src/kudu/util/logging.cc
6 files changed, 461 insertions(+), 2 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5321
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie22a0a29fa00a988a53a15d2c726ce5d49018f4d
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] rpc-test: fix flakiness in tests that expect timeouts

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: rpc-test: fix flakiness in tests that expect timeouts
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5356
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifff555634968bc92f453b25af4d5c15da21edf7c
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] fault injection: use exit instead of exit

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged.

Change subject: fault_injection: use _exit instead of exit
..


fault_injection: use _exit instead of exit

80ac8bae335b490c7b75351e6d4c321a58183c73 caused some tests to fail in
TSAN due to the atexit handlers running when a process crashed with an
injected fault.

This switches to using _exit() instead of exit(), which is equivalent
except that atexit handlers do not run.

Change-Id: I72e30bfa750ace22e1e736e258b3f5720b25a651
Reviewed-on: http://gerrit.cloudera.org:8080/5361
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
---
M src/kudu/util/fault_injection.cc
1 file changed, 4 insertions(+), 1 deletion(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5361
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I72e30bfa750ace22e1e736e258b3f5720b25a651
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] fault injection: use exit instead of exit

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: fault_injection: use _exit instead of exit
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5361
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I72e30bfa750ace22e1e736e258b3f5720b25a651
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] [python] - Add timeouts to Python unit tests

2016-12-05 Thread Jordan Birdsell (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5362

to look at the new patch set (#3).

Change subject: [python] - Add timeouts to Python unit tests
..

[python] - Add timeouts to Python unit tests

This patch adds a global 100 second timeout to all python unit tests.
This is consistent with the JUnit configuration.  If we find that
individual tests need longer than 100 seconds then we can override them.
This configuration was tested in patchset 2, however the forced timeout
will not be committed to master for obvious reasons.

Change-Id: Ibd2a1b7fd7e7e8b2a8ddef9413dc7a863f1968e2
---
M python/kudu/tests/test_scantoken.py
M python/pytest.ini
M python/setup.cfg
M python/setup.py
4 files changed, 9 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/5362/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5362
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibd2a1b7fd7e7e8b2a8ddef9413dc7a863f1968e2
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jordan Birdsell 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-1608: Catalog manager can stop retrying DeleteTablet upon fatal errors

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1608: Catalog manager can stop retrying DeleteTablet upon 
fatal errors
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5357/1/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

Line 2693: return Substitute("TS $0 T $1: delete failed for tablet $2: $3",
This is some pretty old code, I think. The error message prefix is a bit weird 
now, I think it would be better to stick with the "T  P " prefix 
which is standard elsewhere. Maybe something like:

  T 0 P : Delete of tablet  from peer  
failed due to :  ()


Line 2723: case TabletServerErrorPB::TABLET_NOT_RUNNING:
Is the meaning of this code documented somewhere?


-- 
To view, visit http://gerrit.cloudera.org:8080/5357
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id45f07667b6e62ce4814acfdf931dea2af4332d1
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] rpc-test: fix flakiness in tests that expect timeouts

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged.

Change subject: rpc-test: fix flakiness in tests that expect timeouts
..


rpc-test: fix flakiness in tests that expect timeouts

The DoTestExpectTimeout() utility function would set a timeout to some
number of milliseconds N, and then ask the server to sleep for (N + 50)
milliseconds, expecting a timeout. It would then assert that the timeout
was returned after some amount of time between N and N+50, but no more
than N+50ms.

This would be flaky under concurrent load (eg stress threads) because
the sleep(50ms) might sometimes actually sleep for an extra 50-100ms.

This just changes the test to ask the server to sleep for n+500ms,
giving it a lot more budget for sloppiness.

I looped TestCallTimeout/0 with 4 stress threads 1000 times in TSAN.
Before[1] it failed 4/1000. After[2] it didn't fail.

[1] http://dist-test.cloudera.org//job?job_id=todd.1480912345.2054
[2] http://dist-test.cloudera.org//job?job_id=todd.1480912461.12007

Change-Id: Ifff555634968bc92f453b25af4d5c15da21edf7c
Reviewed-on: http://gerrit.cloudera.org:8080/5356
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
---
M src/kudu/rpc/rpc-test-base.h
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5356
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ifff555634968bc92f453b25af4d5c15da21edf7c
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] raft consensus: avoid some unecessary allocations in hot path

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has submitted this change and it was merged.

Change subject: raft_consensus: avoid some unecessary allocations in hot path
..


raft_consensus: avoid some unecessary allocations in hot path

In the stress cluster I noticed that UpdateConsensus does a lot of
allocation in LockForUpdate. In particular, it was constructing a
ConsensusStatusPB just to check whether it was currently a voter, and
that only to provide a log message. That PB has a lot of various strings
and other objects inside of it which caused unnecessary allocation.

We can already get this same information from the existing RaftConfigPB
object in ConsensusMeta, and a lot cheaper.

Change-Id: I75abcaaaed281e5ac1768ea3014064925db6c030
Reviewed-on: http://gerrit.cloudera.org:8080/5344
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy 
---
M src/kudu/consensus/raft_consensus_state.cc
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I75abcaaaed281e5ac1768ea3014064925db6c030
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] raft consensus: avoid some unecessary allocations in hot path

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: raft_consensus: avoid some unecessary allocations in hot path
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I75abcaaaed281e5ac1768ea3014064925db6c030
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] KUDU-1757: fix appendCellValueDebugString, do not throw exception when a column is not set

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1757: fix appendCellValueDebugString, do not throw 
exception when a column is not set
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5237/2/java/kudu-client/src/main/java/org/apache/kudu/client/PartialRow.java
File java/kudu-client/src/main/java/org/apache/kudu/client/PartialRow.java:

PS2, Line 596: {
 :   return;
 : }
> Currently we have two kinds of debug string: debug string and short debug s
Hi Yanlong,

I think the issue you ran into isn't that appendCellValueDebugString requires 
the column to be set, but instead that Operation.toString calls 
PartialRow.stringifyRowKey.  I think instead, PartialRow should have a toString 
method which correctly handles including only the set columns (and which 
internally can call into appendCellValueDebugString).  Then Operation can call 
that.


-- 
To view, visit http://gerrit.cloudera.org:8080/5237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05af5a11e95751b3cc259e786d36494aeb5d3a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Yanlong Zheng 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Yanlong Zheng 
Gerrit-HasComments: Yes


[kudu-CR] Tightening ScanSpec primary bounds when range predicate exists

2016-12-05 Thread Haijie Hong (Code Review)
Haijie Hong has restored this change.

Change subject: Tightening ScanSpec primary bounds when range predicate exists
..


Restored

-- 
To view, visit http://gerrit.cloudera.org:8080/5360
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: restore
Gerrit-Change-Id: I931a617605434700352d285fc2039033cf9eb07e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Tightening ScanSpec primary bounds when range predicate exists

2016-12-05 Thread Haijie Hong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5360

to look at the new patch set (#2).

Change subject: Tightening ScanSpec primary bounds when range predicate exists
..

Tightening ScanSpec primary bounds when range predicate exists

Currently, push upper bound key predicates will break when meeting
a range predicate.
I think the upper bound key can be decreased.
For example, If PK = (a, b) and a < 2 && b < 3, PK can be further
tightning to PK < (1, 3).

Change-Id: I931a617605434700352d285fc2039033cf9eb07e
---
M src/kudu/common/key_util.cc
M src/kudu/common/key_util.h
M src/kudu/common/scan_spec-test.cc
3 files changed, 109 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/5360/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5360
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I931a617605434700352d285fc2039033cf9eb07e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Haijie Hong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] java tests: Clean up and document binDir search

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: java tests: Clean up and document binDir search
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I09765771aa49264c8f1ef9e80aafb3b577cdbe94
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] java tests: Clean up and document binDir search

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has submitted this change and it was merged.

Change subject: java tests: Clean up and document binDir search
..


java tests: Clean up and document binDir search

This patch just does a small cleanup of the binDir search code and
documents that we purposely do not search the current working directory
for the Kudu bin directory.

Change-Id: I09765771aa49264c8f1ef9e80aafb3b577cdbe94
Reviewed-on: http://gerrit.cloudera.org:8080/5348
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert 
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
1 file changed, 37 insertions(+), 15 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I09765771aa49264c8f1ef9e80aafb3b577cdbe94
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-766: limit number of glog files
..


Patch Set 2:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/5340/1/src/kudu/integration-tests/external_mini_cluster.cc
File src/kudu/integration-tests/external_mini_cluster.cc:

PS1, Line 293: Substitute("master-$0", i);
> Let's encapsulate this somewhere. It's also used by one or two tests.
Do you mind if I hold off on this?  It's not immediately clear what the right 
abstraction is.  I started implementing

class ExternalDaemon {
virtual static daemon_id(int32_t id) const;
}

But there are a couple of issues:

1) virtual and static can't mix, so it would have to be defined on each of the 
subclasses.

2) This API can only be used if the caller assumes the IDs are consecutive 
integers starting at 0, so it might be hiding one aspect of external mini 
cluster implementation (the string format), but it still assumes a knowledge of 
another aspect (the ID generation pattern).


PS1, Line 295: boost::optional log_dir = GetLogPath(daemon_id);
 : if (log_dir) {
 :   RETURN_NOT_OK(Env::Default()->CreateDir(*log_dir));
 : }
> Since we're doing the same thing for masters/tservers, can we do it just on
Done


PS1, Line 334: Substitute("ts-$0", idx);
> Encapsulate.
same.


http://gerrit.cloudera.org:8080/#/c/5340/2/src/kudu/integration-tests/external_mini_cluster.cc
File src/kudu/integration-tests/external_mini_cluster.cc:

Line 232:   } else {
> warning: don't use else after return [readability-else-after-return]
Done


http://gerrit.cloudera.org:8080/#/c/5340/2/src/kudu/integration-tests/external_mini_cluster.h
File src/kudu/integration-tests/external_mini_cluster.h:

Line 376:   const boost::optional& log_dir() const {
> error: use of undeclared identifier 'string'; did you mean 'strings'? [clan
Done


Line 377: return log_dir_;
> error: no viable conversion from returned value of type 'const boost::optio
not sure what this means; going to assume it's because I messed up the 
namespacing.


http://gerrit.cloudera.org:8080/#/c/5340/1/src/kudu/integration-tests/log-rolling-itest.cc
File src/kudu/integration-tests/log-rolling-itest.cc:

PS1, Line 34: vector
> Should #include and "using" std::vector.
Done


Line 49:   CHECK_OK(cluster.Start());
> These should all be ASSERT_OK.
Done


http://gerrit.cloudera.org:8080/#/c/5340/1/src/kudu/server/server_base.cc
File src/kudu/server/server_base.cc:

Line 328: WARN_NOT_OK(DeleteExcessLogFiles(), "Unable to delete excess log 
files");
> If we actually do warn something, we're likely to repeatedly warn it with e
Yes, this would warn every 60 seconds.  Note that StartExcessGlogDeleterThread 
calls this inline with the server startup, so if the permissions are bad server 
startup will fail.  The permissions would have to change at runtime in order to 
get failure.


Line 362:   if (metrics_logging_thread_) {
> This needs to be reworked to account for the new thread.
Reworked in what way?  I looked at this when I first made the change to see if 
I needed to initialize the count to two instead of one, but I don't think 
that's necessary.  There's still only one context shutting down the server, 
regardless of the number of background threads waiting on the latch.


http://gerrit.cloudera.org:8080/#/c/5340/1/src/kudu/util/logging.cc
File src/kudu/util/logging.cc:

Line 319:   // Ignore bad input or disable log rotation.
> How about passing in an Env* as an argument? One can be had from ServerBase
Done


PS1, Line 325: 
> These aren't modifiable at runtime, right?
No, these are both used in the glog initialization so I don't think they can be 
changed (plus they are strings, so there would be thread safety issues).  
max_log_files is set to be modifiable, since I didn't see a downside to doing 
so.


Line 333:   continue;
> Aren't glog filenames terminated with a date/time stamp, with second-level 
Yes, I think it would.  I think we should stick with mtimes for now, since 
that's what Impala uses, and it seems to be working for them.  I'll add a note 
to that effect.


http://gerrit.cloudera.org:8080/#/c/5340/1/src/kudu/util/logging.h
File src/kudu/util/logging.h:

Line 216: // Deletes excess rotated log files.
> Would be nice to specify here what we're using to figure out "excess" (mtim
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5340

to look at the new patch set (#3).

Change subject: KUDU-766: limit number of glog files
..

KUDU-766: limit number of glog files

This commit introduces a new flag, 'max_log_files', that limits the
maximum number of rolled glog files to keep around at each severity
level. This flag matches an Apache Impala flag of the same name, and the
implementation is partly borrowed from it.

I initially went the route of making a new maintenance manager operation
type to schedule the file cleanup, but it proved difficult to shoehorn
this usecase into that abstraction.  Instead, a thead is spawned on
startup that checks for excess files every 60 seconds.

Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
A src/kudu/integration-tests/log-rolling-itest.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/metrics.h
9 files changed, 280 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/5340/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

2016-12-05 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5300

to look at the new patch set (#15).

Change subject: KUDU-798 (part 4) Add a TimeManager to manage safe time 
advancement
..

KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

This patch adds an entity to manage safe time (the timestamp before
which all transactions are committed/aborted or in-flight) across
a consensus configuration. This allows scans to make sure/wait that
the scan timestamp is safe, thus making sure the scan is repeatable.

This adds a small unit test. Follow up patches will use this
in integration tests.

Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
---
M src/kudu/consensus/CMakeLists.txt
A src/kudu/consensus/time_manager-test.cc
A src/kudu/consensus/time_manager.cc
A src/kudu/consensus/time_manager.h
4 files changed, 634 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/5300/15
-- 
To view, visit http://gerrit.cloudera.org:8080/5300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

2016-12-05 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-798 (part 4) Add a TimeManager to manage safe time 
advancement
..


Patch Set 14:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/5300/14/src/kudu/consensus/time_manager-test.cc
File src/kudu/consensus/time_manager-test.cc:

PS14, Line 52:   // Waits (forever) for the TimeManager's safe time to have 
advanced to 'safe_time'.
 :   // Returns a latch that allows to wait for TimeManager to 
consider 'safe_time' safe.
 :  
> the two lines of this comment seem contradictory with each other. The secon
Done


Line 58:   time_manager_->WaitUntilSafe(safe_time, MonoTime::Max());
> should this CHECK_OK?
Done


Line 94:   ASSERT_OK(time_manager_->MessageReceivedFromLeader
> weird wrapping
Done


Line 101:   MonoTime after_small = MonoTime::Now();
> I tihnk with recent changes we can just do:
Done


http://gerrit.cloudera.org:8080/#/c/5300/14/src/kudu/consensus/time_manager.cc
File src/kudu/consensus/time_manager.cc:

Line 107: Timestamp t(message.timestamp());
> unused?
Done


Line 170: void TimeManager::AdvanceSafeTimeAndWakeUpWaiters(Timestamp 
safe_time) {
> should this be named 'Unlocked'?
Done


PS14, Line 190: (
> also unlocked?
Done


http://gerrit.cloudera.org:8080/#/c/5300/14/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

PS14, Line 40: Because of this the read will be repeatable
> this probably needs to say something like "If the scanner additionally uses
Done


PS14, Line 52: sintony
> typo?
this is a "portuguesism" :P I looked in the dictionary and it seemed to exist 
:). Done


Line 70:   TimeManager(scoped_refptr clock,
> nit: no need for wrap
Done


PS14, Line 95: CHECK failure if it isn't)
> curious why the inconsistency that here it's a CHECK failure but above it's
because of races in changes of leadership status. consensus might not be leader 
anymore by the time prepare tries to assign a timestamp so the above shouldn't 
crash. This one is called on UpdateConsensus() though, so the timemanager 
should not be in leader mode.


PS14, Line 107:  'message's timestamp
> 'safe_time'
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5340

to look at the new patch set (#4).

Change subject: KUDU-766: limit number of glog files
..

KUDU-766: limit number of glog files

This commit introduces a new flag, 'max_log_files', that limits the
maximum number of rolled glog files to keep around at each severity
level. This flag matches an Apache Impala flag of the same name, and the
implementation is partly borrowed from it.

I initially went the route of making a new maintenance manager operation
type to schedule the file cleanup, but it proved difficult to shoehorn
this usecase into that abstraction.  Instead, a thead is spawned on
startup that checks for excess files every 60 seconds.

Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
A src/kudu/integration-tests/log-rolling-itest.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/metrics.h
9 files changed, 280 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/5340/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [i-tests] fix TestWorkload to allow set num tablets(1)

2016-12-05 Thread Alexey Serbin (Code Review)
Hello Dinesh Bhat, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5347

to look at the new patch set (#2).

Change subject: [i-tests] fix TestWorkload to allow set_num_tablets(1)
..

[i-tests] fix TestWorkload to allow set_num_tablets(1)

Fixed typo in TestWorkload::set_num_tablets(): it should accept 1
as well since 1 is the actual default value for the num_tablets_
member.

Change-Id: Iaf9d0e06550fffdd6a04f6bfba53fe873fb33749
---
M src/kudu/integration-tests/test_workload.h
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/5347/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5347
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaf9d0e06550fffdd6a04f6bfba53fe873fb33749
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [i-tests] fix TestWorkload to allow set num tablets(1)

2016-12-05 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: [i-tests] fix TestWorkload to allow set_num_tablets(1)
..


Patch Set 1:

> (1 comment)

Done

-- 
To view, visit http://gerrit.cloudera.org:8080/5347
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf9d0e06550fffdd6a04f6bfba53fe873fb33749
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-921. tablet copy: Make the StartTabletCopy() RPC async

2016-12-05 Thread Mike Percy (Code Review)
Hello Dinesh Bhat, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5045

to look at the new patch set (#9).

Change subject: KUDU-921. tablet copy: Make the StartTabletCopy() RPC async
..

KUDU-921. tablet copy: Make the StartTabletCopy() RPC async

This patch changes tablet copy to execute on its own thread pool.
This clears up threads on the Consensus service pool for other tasks.

A new ThreadPool called tablet_copy_pool_ was added to TSTabletManager
to run TabletCopy operations. Its max_threads parameter is tunable with
a new gflag and it has its max_queue_size hard-coded to 0 in order to
provide backpressure when it doesn't have capacity to immediately copy
new tablets.

This patch changes the semantics of StartTabletCopy() to return as soon
as the tablet copy has started -- it no longer waits until the process
is completed to return. Clients can follow the progress of the tablet
copy process using the ListTablets() RPC call and waiting for the tablet
to be in a RUNNING state.

A test was added in tablet_copy-itest that checks that the backpressure
mechanism is working such that submitting too many StartTabletCopy()
requests at one time results in a ServiceUnavailable error.

Some additional tests were added in tablet_copy_client_session-itest to
improve test coverage of the StartTabletCopy() code path.

Change-Id: I95c63f2bfd6762487862efbdba9cb3676112
---
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/integration-tests/tablet_copy_client_session-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/tserver/tablet_copy_service.cc
M src/kudu/tserver/tablet_peer_lookup.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
10 files changed, 349 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/5045/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95c63f2bfd6762487862efbdba9cb3676112
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [i-tests] fix TestWorkload to allow set num tablets(1)

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: [i-tests] fix TestWorkload to allow set_num_tablets(1)
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5347
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf9d0e06550fffdd6a04f6bfba53fe873fb33749
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] [i-tests] fix TestWorkload to allow set num tablets(1)

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: [i-tests] fix TestWorkload to allow set_num_tablets(1)
..


Patch Set 2:

I just made this same change in a different patch, funny that this lasted so 
long

-- 
To view, visit http://gerrit.cloudera.org:8080/5347
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf9d0e06550fffdd6a04f6bfba53fe873fb33749
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] [python] - Add timeouts to Python unit tests

2016-12-05 Thread Jordan Birdsell (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5362

to look at the new patch set (#4).

Change subject: [python] - Add timeouts to Python unit tests
..

[python] - Add timeouts to Python unit tests

This patch adds a global 100 second timeout to all python unit tests.
This is consistent with the JUnit configuration.  If we find that
individual tests need longer than 100 seconds then we can override them.
This configuration was tested locally with a forced timeout.

Change-Id: Ibd2a1b7fd7e7e8b2a8ddef9413dc7a863f1968e2
---
M python/pytest.ini
M python/setup.py
2 files changed, 4 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/5362/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5362
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibd2a1b7fd7e7e8b2a8ddef9413dc7a863f1968e2
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jordan Birdsell 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

2016-12-05 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-798 (part 4) Add a TimeManager to manage safe time 
advancement
..


Patch Set 15:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/5300/15/src/kudu/consensus/time_manager-test.cc
File src/kudu/consensus/time_manager-test.cc:

PS15, Line 18: #include 
nit: by style guide, this should come after the STL headers.


PS15, Line 39: CHECK_OK
ASSERT_OK() ?


PS15, Line 47: InitTimeManager
Does this need to be public?  Consider to make it protected.


PS15, Line 72: TEST_F(TimeManagerTest, TestTimeManagerNonLeaderMode)
It would be nice to have a concise description for the test.


PS15, Line 136: TEST_F(TimeManagerTest, TestTimeManagerLeaderMode)
It would be nice to have a brief description of the test.


PS15, Line 194: after_latch->Wait();
Is there anything to check after the Wait() call completes?


http://gerrit.cloudera.org:8080/#/c/5300/15/src/kudu/consensus/time_manager.cc
File src/kudu/consensus/time_manager.cc:

Line 17: 
Consider adding

#include 


PS15, Line 25: disable_safe_time_advancement_without_writes
It looks like a very long name.  Is it possible to replace with

advance_safe_time_without_writes = true by default


PS15, Line 30: raft_heartbeat_interval_ms
Is it needed here?  I could not find where it's used in this file.


PS15, Line 46: external_consistency_mode
The external consistency mode is an optional field in WriteRequestPB message.  
Consider adding a check prior to accessing the field.


PS15, Line 144: waiters_.push_back
Consider adding a scope cleanup to make sure the waiters_ container is cleaned 
up from non-valid waiter when the control returns from this method.


PS15, Line 148: if (waiter.latch->WaitUntil(deadline))
Is it necessary to remove the waiter from the container?


PS15, Line 154: if (waiter.latch->count() == 0) return Status::OK();
Is it necessary to clean up the waiters_ container, removing the waiter?


http://gerrit.cloudera.org:8080/#/c/5300/15/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

PS15, Line 112: higher
less?


PS15, Line 126: GetSafeTime
nit: consider adding 'const' modifier


-- 
To view, visit http://gerrit.cloudera.org:8080/5300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: scanner should not retry a SCANNER EXPIRED error without reopening

2016-12-05 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: WIP: scanner should not retry a SCANNER_EXPIRED error without 
reopening
..


Patch Set 1:

(2 comments)

Todd, thank you for putting up the patch.  I'll try to create a test for that 
to make sure this works as expected.  Or you are confident this is the fix and 
no test/verification is needed?

http://gerrit.cloudera.org:8080/#/c/2711/1/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS1, Line 131: needs_reopen
nit: may be, make it possible to pass nullptr for 'need_reopen' parameter?


PS1, Line 419: needs_reopen
nit: just pass nullptr once HandleError() accepts that.


-- 
To view, visit http://gerrit.cloudera.org:8080/2711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5e09849d5b749db958c3a3df89fd8de9947ab991
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-1369. client: fail over scans to a new replica if current replica is out-of-date

2016-12-05 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: WIP: KUDU-1369. client: fail over scans to a new replica if 
current replica is out-of-date
..


Patch Set 1:

> David/Alexey -- just found this old WIP patch of mine that might be
 > relevant for the stuff you've been working on. Feel free to abandon
 > it if you already did something similar.

Todd, thank for the patch.  I'll try to revive this as needed.

-- 
To view, visit http://gerrit.cloudera.org:8080/2514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I54351ed9352ef77d9a935637e365e54f644dd927
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] WIP: KUDU-1767. Create a client flush interleave test

2016-12-05 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: WIP: KUDU-1767. Create a client flush interleave test
..


Patch Set 2:

> Thanks for the review, Alexey. Yes, this test is currently pretty
 > messy. I do need to clean it up but I think we need to come up with
 > a real solution to the problem it exposes before I need to do that.
 > :)
 > 
 > Based on comments on KUDU-1767 I think we'll be prioritizing some
 > other things sooner.

ok, I see.

-- 
To view, visit http://gerrit.cloudera.org:8080/5256
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3886f70246c6ef3f94acda6074d02d4fb199cf50
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No


[kudu-CR] WIP: KUDU-1767. Create a client flush interleave test

2016-12-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: WIP: KUDU-1767. Create a client flush interleave test
..


Patch Set 2:

> ok, I see.

I think you assumed we would check this in DISABLED. It's actually something I 
hadn't considered before I saw you doing that. I guess it's not a bad idea...

-- 
To view, visit http://gerrit.cloudera.org:8080/5256
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3886f70246c6ef3f94acda6074d02d4fb199cf50
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No


[kudu-CR] [i-tests] fix TestWorkload to allow set num tablets(1)

2016-12-05 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged.

Change subject: [i-tests] fix TestWorkload to allow set_num_tablets(1)
..


[i-tests] fix TestWorkload to allow set_num_tablets(1)

Fixed typo in TestWorkload::set_num_tablets(): it should accept 1
as well since 1 is the actual default value for the num_tablets_
member.

Change-Id: Iaf9d0e06550fffdd6a04f6bfba53fe873fb33749
Reviewed-on: http://gerrit.cloudera.org:8080/5347
Reviewed-by: Mike Percy 
Tested-by: Kudu Jenkins
---
M src/kudu/integration-tests/test_workload.h
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5347
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iaf9d0e06550fffdd6a04f6bfba53fe873fb33749
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [i-tests] fix TestWorkload to allow set num tablets(1)

2016-12-05 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: [i-tests] fix TestWorkload to allow set_num_tablets(1)
..


Patch Set 2:

> I just made this same change in a different patch, funny that this
 > lasted so long

:)

Thank you for the review -- I'll push this then.

-- 
To view, visit http://gerrit.cloudera.org:8080/5347
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf9d0e06550fffdd6a04f6bfba53fe873fb33749
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Add integration tests for duplicate keys

2016-12-05 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: Add integration tests for duplicate keys
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5349/7/src/kudu/integration-tests/cluster_verifier.cc
File src/kudu/integration-tests/cluster_verifier.cc:

PS7, Line 126:   
CHECK_OK(scanner.SetReadMode(client::KuduScanner::READ_AT_SNAPSHOT));
 :   CHECK_OK(scanner.SetFaultTolerant());
 :   CHECK_OK(scanner.SetTimeoutMillis(5000));
 :   CHECK_OK(scanner.SetProjectedColumns(vector()));
> We were already using this pattern. We RETURN_NOT_OK() on things that may f
>From the caller's point of view, what is the difference if this 
>method/function would return an error either from client->OpenTable() or 
>scanner.SetReadMode()?

I think calling abort() does not make sense when it's possible to handle the 
error gracefully and staying consistent.  As I see it, the only case when you 
would want to call abort() is when it's clear the consistency of the code is 
compromised and it's impossible to continue safely.  This is not the case, IMO.


-- 
To view, visit http://gerrit.cloudera.org:8080/5349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d018281d412ae034bd7b70c8311077a52b2795d
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-766: limit number of glog files
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5340/1/src/kudu/integration-tests/external_mini_cluster.cc
File src/kudu/integration-tests/external_mini_cluster.cc:

PS1, Line 293: peer_addrs[i],
> Do you mind if I hold off on this?  It's not immediately clear what the rig
Hmm, didn't realize it was complicated. Okay.


http://gerrit.cloudera.org:8080/#/c/5340/1/src/kudu/server/server_base.cc
File src/kudu/server/server_base.cc:

Line 362:   if (metrics_logging_thread_) {
> Reworked in what way?  I looked at this when I first made the change to see
1) We're not joining on the new thread at all.
2) We should CountDown() regardless of whether metrics_logging_thread_ exists.


http://gerrit.cloudera.org:8080/#/c/5340/1/src/kudu/util/logging.cc
File src/kudu/util/logging.cc:

Line 333: 
> Yes, I think it would.  I think we should stick with mtimes for now, since 
Before committing to this approach I'd like to understand why Impala does it 
this way. Could you investigate?


http://gerrit.cloudera.org:8080/#/c/5340/4/src/kudu/util/logging.cc
File src/kudu/util/logging.cc:

Line 18: 
Since you're changing the system includes, mind splitting out the real "system" 
ones from the Kudu-provided ones? That is, boost and glog should be in a 
separate group after the STL includes.


-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1785. Fix potential crash in TabletCopySourceSession

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1785. Fix potential crash in TabletCopySourceSession
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5363/2//COMMIT_MSG
Commit Message:

PS2, Line 14: Added a stress test in tablet_copy_client_session-itest to 
trigger the
: issue.
How often was the issue triggered by this test without the fix?


http://gerrit.cloudera.org:8080/#/c/5363/2/src/kudu/integration-tests/tablet_copy_client_session-itest.cc
File src/kudu/integration-tests/tablet_copy_client_session-itest.cc:

PS2, Line 52:   // We don't want the master to interfere when we forcibly make 
copies of
:   // tablets onto servers it doesn't know about. Only 2 servers.
Does this comment explain why master_tombstone_evicted_tablet_replicas=false?

If so, could you move it to be above that line too? If not, could you add a 
comment justifying the change from the non-default value?


Line 65:   workload.set_num_write_threads(8);
Why override the default value for this?


PS2, Line 70:   while (workload.rows_inserted() < 1) {
: SleepFor(MonoDelta::FromMilliseconds(10));
:   }
Why do we need to do this?


PS2, Line 88: StartSingleNode
Not sure this is an ideal name for the function; it's actually starting an 
entire cluster (master + 2 tservers). The magic is that all of the 
single-replica tablets are on the first tserver; maybe you can rename the 
function to make that explicit?


PS2, Line 109: std::numeric_limits::max()
Isn't this argument an int64_t though?

If it's optional, would be nice to use a boost::optional to convey that.


-- 
To view, visit http://gerrit.cloudera.org:8080/5363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6f3ec35885dbf1a81c23ac10b1c9556dfddbd4b7
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1622. result tracker: respond to RPCs outside of the lock

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1622. result_tracker: respond to RPCs outside of the lock
..


Patch Set 2: Code-Review+2

(1 comment)

Looks OK to me, but leaving unmerged in case David wanted to take a look too.

http://gerrit.cloudera.org:8080/#/c/5359/2/src/kudu/rpc/result_tracker.cc
File src/kudu/rpc/result_tracker.cc:

Line 405:   // Wait until outside the lock to do the heavy-weight work.
Nit: separate from above with an empty line.


-- 
To view, visit http://gerrit.cloudera.org:8080/5359
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I065b9e40bc1af81e0871220a0a01461ea35143b5
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] Add file globbing to Env

2016-12-05 Thread Dan Burkert (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5338

to look at the new patch set (#4).

Change subject: Add file globbing to Env
..

Add file globbing to Env

Change-Id: Ic71160b11b19811ac9669dc7a572a7e4d2e59e79
---
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
3 files changed, 57 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/38/5338/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic71160b11b19811ac9669dc7a572a7e4d2e59e79
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Revert "env: change various file filename() functions to return copies"

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: Revert "env: change various file filename() functions to return 
copies"
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib920fc9a418dcfdf35ba1b038f288ca2ee98343a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5340

to look at the new patch set (#5).

Change subject: KUDU-766: limit number of glog files
..

KUDU-766: limit number of glog files

This commit introduces a new flag, 'max_log_files', that limits the
maximum number of rolled glog files to keep around at each severity
level. This flag matches an Apache Impala flag of the same name, and the
implementation is partly borrowed from it.

I initially went the route of making a new maintenance manager operation
type to schedule the file cleanup, but it proved difficult to shoehorn
this usecase into that abstraction.  Instead, a thead is spawned on
startup that checks for excess files every 60 seconds.

Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
A src/kudu/integration-tests/log-rolling-itest.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/metrics.h
9 files changed, 287 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/5340/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Revert "env: change various file filename() functions to return copies"

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged.

Change subject: Revert "env: change various file filename() functions to return 
copies"
..


Revert "env: change various file filename() functions to return copies"

I didn't end up needing this flexibility in the file cache implementation,
and returning references is cheaper.

This reverts commit e836ac18df2ec2405a61aa7a9a90d61ee30716c5.

Change-Id: Ib920fc9a418dcfdf35ba1b038f288ca2ee98343a
Reviewed-on: http://gerrit.cloudera.org:8080/5318
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert 
---
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/pb_util.cc
M src/kudu/util/pb_util.h
5 files changed, 11 insertions(+), 11 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib920fc9a418dcfdf35ba1b038f288ca2ee98343a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Add file globbing to Env

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: Add file globbing to Env
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5338/1/src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc
File src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc:

PS1, Line 74: 
> And I presume this is OK too (since it survived in PS2).
It appears the answer is "no", so I reverted this part of the change.


http://gerrit.cloudera.org:8080/#/c/5338/1/src/kudu/util/env.h
File src/kudu/util/env.h:

PS1, Line 266: matchin
> matching
Done


Line 268:   virtual Status Glob(const std::string& path_pattern, 
std::vector* paths) = 0;
> error: no template named 'vector'; did you mean '__gnu_cxx::vector'? [clang
Done


http://gerrit.cloudera.org:8080/#/c/5338/1/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 1163: auto cleanup = MakeScopedCleanup([&] { globfree(&result); });
> Since you retained this in PS2 I presume it's safe.
yes, it must be called in the event of an error.


Line 1165: int ret = glob(path_pattern.c_str(), GLOB_TILDE | GLOB_ERR , 
NULL, &result);
> Don't think GLOB_TILDE_CHECK is a good idea?
GLOB_TILDE_CHECK is not defined on macOS.  I tried to test GLOB_ERR on macOS, 
but it doesn't seem to do anything.


-- 
To view, visit http://gerrit.cloudera.org:8080/5338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic71160b11b19811ac9669dc7a572a7e4d2e59e79
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-766: limit number of glog files
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5340/3/src/kudu/integration-tests/external_mini_cluster.cc
File src/kudu/integration-tests/external_mini_cluster.cc:

Line 294:SubstituteInFlags(flags, i));
> warning: passing result of std::move() as a const reference argument; no mo
Done


http://gerrit.cloudera.org:8080/#/c/5340/1/src/kudu/server/server_base.cc
File src/kudu/server/server_base.cc:

Line 362:   stop_background_threads_latch_.CountDown();
> 1) We're not joining on the new thread at all.
Done


http://gerrit.cloudera.org:8080/#/c/5340/1/src/kudu/util/logging.cc
File src/kudu/util/logging.cc:

Line 333:   continue;
> Before committing to this approach I'd like to understand why Impala does i
Here is the gerrit introducing log rotation into Impala: 
https://gerrit.sjc.cloudera.com/#/c/5706.  I scanned it but didn't see anything 
but style comments.  I don't think we are committing to this approach 
permanently, but I think there is some amount of risk that we haven't thought 
about some downside to the alternative, and this is an extremely difficult 
thing to test since it's so tied into glog / process global state.


http://gerrit.cloudera.org:8080/#/c/5340/4/src/kudu/util/logging.cc
File src/kudu/util/logging.cc:

Line 18: 
> Since you're changing the system includes, mind splitting out the real "sys
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Add file globbing to Env

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: Add file globbing to Env
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic71160b11b19811ac9669dc7a572a7e4d2e59e79
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-766: limit number of glog files
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Add file modified time to Env

2016-12-05 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5339

to look at the new patch set (#5).

Change subject: Add file modified time to Env
..

Add file modified time to Env

Change-Id: I90b3e759a4dbb352a0c09dc726b428c9dcea5595
---
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
4 files changed, 46 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/5339/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5339
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I90b3e759a4dbb352a0c09dc726b428c9dcea5595
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Add file modified time to Env

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: Add file modified time to Env
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5339/1/src/kudu/util/env-test.cc
File src/kudu/util/env-test.cc:

PS1, Line 695: writer->Append(" ");
 : writer->Sync();
> Shouldn't this be outside of (and before) AssertEventually? Or did you want
It needs to do a new append and sync on each iteration, since it needs to 
update the mtime each iteration.


http://gerrit.cloudera.org:8080/#/c/5339/1/src/kudu/util/env.h
File src/kudu/util/env.h:

PS1, Line 178: is frequently as high as 1 second
> Nit: maybe "may be as high as 1 second on some platforms" to indicate that 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5339
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I90b3e759a4dbb352a0c09dc726b428c9dcea5595
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Add file modified time to Env

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: Add file modified time to Env
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5339
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I90b3e759a4dbb352a0c09dc726b428c9dcea5595
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] [consensus] KUDU-1613: Fix replica eviction failure for WRONG SERVER UUID

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [consensus] KUDU-1613: Fix replica eviction failure for 
WRONG_SERVER_UUID
..


Patch Set 5:

(5 comments)

I'm deferring to Mike/David since I'm less familiar with this code.

http://gerrit.cloudera.org:8080/#/c/5111/5/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

Line 238:   switch (ps.response) {
Maybe add a default case that calls LOG(FATAL) on the (unknown) response type?


http://gerrit.cloudera.org:8080/#/c/5111/5/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:

PS5, Line 140: respond with appropriate response status.
Nit: technically, this method does not "respond" (in the sense of sending an 
RPC response).

Perhaps "and return the appropriate response status"?


http://gerrit.cloudera.org:8080/#/c/5111/5/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 2770
Why was this comment removed? If it's no longer true, shouldn't the entire 
DeleteTablet() call be thrown out too?


PS5, Line 2782: // These tests exhibit the replica eviction behaviors when a 
follower
  : // returns WRONG_SERVER_UUID error code. Tests verify that 
followers
  : // returning these error codes are evicted from consensus after 
a
  : // specified --follower_unavailable_considered_failed_sec.
Nit: this is talking about tests in the plural, but it's just a single test, 
right?


Line 2795:   int leader_indx = -1;
Nit: conventionally, we use 'idx' as an infix for naming an index, not 'indx'.


-- 
To view, visit http://gerrit.cloudera.org:8080/5111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d3f84fd297a8e4760208a213c1ee393e92499a3
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [consensus] KUDU-1407 replica is not evcited when 
TABLET_NOT_RUNNING
..


Patch Set 3:

(4 comments)

Again, will defer to Mike and David.

http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

Line 818: TEST_F(ConsensusQueueTest, TestTriggerTabletCopyIfTabletNotRunning) {
This looks like it's exactly the same test as the one below, but with a 
different code. Could you consolidate all of the shared code? If this means 
running both "tests" in a single TEST_F(), that's fine too.


http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 2772:   // A new good copy should get created because follower returns
Ah, this addresses one of my comments from your earlier patch. It looks like 
this hunk belonged in that patch. Can you move it?


http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

Line 574: Status TSTabletManager::StartTabletStateTransitionUnlocked(
These changes affect the semantics of DeleteTablet() and StartTabletCopy() 
somewhat, right? AFAICT it's now possible for DeleteTablet() to return 
ALREADY_INPROGRESS, and for StartTabletCopy() to return TABLET_NOT_RUNNING.

I don't think that's necessarily wrong, I just want to make sure you've thought 
through the side effects of that. How will the RPC callers of DeleteTablet() 
and StartTabletCopy() react to those errors?


http://gerrit.cloudera.org:8080/#/c/5352/3/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

PS3, Line 207: replica
Nit: the rest of this comment refers to "tablet" when describing a "tablet 
replica", so please do the same for the new section you added.


-- 
To view, visit http://gerrit.cloudera.org:8080/5352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1608: Catalog manager can stop retrying DeleteTablet upon fatal errors

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1608: Catalog manager can stop retrying DeleteTablet upon 
fatal errors
..


Patch Set 1:

(1 comment)

Looks good. As we discussed, please try to build tests for this using 
tserver-side metrics to verify that DeleteTablet() RPCs aren't received once 
the master has figured out that it should stop.

If anyone else has suggestions on how Dinesh could test this, please chime in!

http://gerrit.cloudera.org:8080/#/c/5357/1/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

Line 2692:   virtual string LogFailure(const string& reason, const Status& 
status) {
Why is this method virtual? I don't see any overrides anywhere.


-- 
To view, visit http://gerrit.cloudera.org:8080/5357
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id45f07667b6e62ce4814acfdf931dea2af4332d1
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] util: add file cache

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: util: add file cache
..


Patch Set 8:

(4 comments)

only 1/2 way through but going to switch to r8 for the rest.

http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-stress-test.cc
File src/kudu/util/file_cache-stress-test.cc:

Line 133: deque> files;
It took me a bit to figure out the difference between this and the global pool 
of files -- you may want to add a comment like "Active opened files in this 
consumer thread"


http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-test-util.h
File src/kudu/util/file_cache-test-util.h:

Line 68:   CHECK_LE(open_fd_count, max_fd_count);
This should include an error message, since the LOG_EVERY_N may not have fired 
since the count was less than the max.


http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-test.cc
File src/kudu/util/file_cache-test.cc:

Line 72:   void AssertFdsAndDescriptors(int num_expected_fds,
at every call site you are offsetting by this->initial_open_fds_, maybe the 
method could do that automatically?


Line 101:   {
Is this scope necessary?


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-1407 replica is not evcited when TABLET NOT RUNNING

2016-12-05 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: [consensus] KUDU-1407 replica is not evcited when 
TABLET_NOT_RUNNING
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5352/3//COMMIT_MSG
Commit Message:

PS3, Line 7: evcited
> evicted
haven't looked at this code yet, but I can't tell from the commit message what 
the effect of this change is.

What's the change being made? You're fixing a bug such that now tablets _will_ 
be evicted for TABLET_NOT_RUNNING? Or making it so that they won't be?

I think not evicting is the correct behavior for the case when a tablet is in 
BOOTSTRAPPING state. Otherwise restarting a node (as in rolling restart) would 
cause all of its replicas to be evicted and cause big issues.


-- 
To view, visit http://gerrit.cloudera.org:8080/5352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I554ad61f25a7de78eda60fd50228e0f015c1b625
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1608: Catalog manager can stop retrying DeleteTablet upon fatal errors

2016-12-05 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1608: Catalog manager can stop retrying DeleteTablet upon 
fatal errors
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5357/1//COMMIT_MSG
Commit Message:

Line 11: which indicate that tablet ceases to exist on that tserver.
TABLET_NOT_RUNNING could mean that the tablet is in the process of 
bootstrapping, in which case we _should_ keep retrying, so that when the tablet 
finishes bootstrapping it gets properly deleted, no?

Or do we properly delete it anyway because when it finishes bootstrapping, it 
reports some kind of change in a tablet report, resulting in a new DeleteTablet 
being sent?


-- 
To view, visit http://gerrit.cloudera.org:8080/5357
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id45f07667b6e62ce4814acfdf931dea2af4332d1
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1622. result tracker: respond to RPCs outside of the lock

2016-12-05 Thread Todd Lipcon (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5359

to look at the new patch set (#3).

Change subject: KUDU-1622. result_tracker: respond to RPCs outside of the lock
..

KUDU-1622. result_tracker: respond to RPCs outside of the lock

This patch started with the goal of addressing the flakiness of
ExactlyOnceRpcTest.TestExactlyOnceSemanticsAfterRpcCompleted. The issue
was that the MemTracker updates resulting from an RPC response were
occurring asynchronously after the RPC was responded, and in two
separate increments. Thus, in this test, there was one scenario in which
the test would record a partial update and then in a later assertion not
match the earlier recorded value.

In order to fix this, I ended up at a solution which both fixed the
flakiness and reduced lock contention in ResultTracker. We now respond
to RPCs after we've done the required MemTracker updates, which also
means that we respond to RPCs after dropping the ResultTracker locks.
This has been a high contention point in stress tests, so this is a nice
benefit.

Fixing the flakiness also meant that other 'AssertEventually'
workarounds from this test case could now be removed.

I looped ExactlyOnceRpcTest.TestExactlyOnceSemanticsAfterRpcCompleted
500 times with 4 stress threads and it passed 100% of the time[1]. Prior
to this patch, it would reliably fail at least a couple runs out of
500[2].

I also looped exactly_once_writes-itest 500 times with this patch
with 100% success[3].

[1] http://dist-test.cloudera.org//job?job_id=todd.1480928777.29435
[2] http://dist-test.cloudera.org//job?job_id=todd.1480929041.29742
[3] http://dist-test.cloudera.org//job?job_id=todd.1480929367.31010

Change-Id: I065b9e40bc1af81e0871220a0a01461ea35143b5
---
M src/kudu/rpc/exactly_once_rpc-test.cc
M src/kudu/rpc/result_tracker.cc
2 files changed, 93 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/5359/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5359
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I065b9e40bc1af81e0871220a0a01461ea35143b5
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Dan Burkert (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5340

to look at the new patch set (#6).

Change subject: KUDU-766: limit number of glog files
..

KUDU-766: limit number of glog files

This commit introduces a new flag, 'max_log_files', that limits the
maximum number of rolled glog files to keep around at each severity
level. This flag matches an Apache Impala flag of the same name, and the
implementation is partly borrowed from it.

I initially went the route of making a new maintenance manager operation
type to schedule the file cleanup, but it proved difficult to shoehorn
this usecase into that abstraction.  Instead, a thead is spawned on
startup that checks for excess files every 60 seconds.

Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
A src/kudu/integration-tests/log-rolling-itest.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/metrics.h
9 files changed, 289 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/5340/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [i-tests] separated implementation of the ext mini-cluster

2016-12-05 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5367

Change subject: [i-tests] separated implementation of the ext mini-cluster
..

[i-tests] separated implementation of the ext mini-cluster

Separated implementation of the external mini-cluster methods into a
separate .cc file: those methods contain too many code to be candidates
for inlining.  Besides, this helps to have less header-wise dependencies
and avoid clang-tidy warnings like

  warning: function 'StopCluster' defined in a header file; function
definitions in header files can lead to ODR violations
[misc-definitions-in-headers]

This patch does not contain any functional changes.

Change-Id: I4e69794502d31dc43cc2eba6870c01c828b14da3
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/create-table-itest.cc
M src/kudu/integration-tests/delete_table-test.cc
A src/kudu/integration-tests/external_mini_cluster-itest-base.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.h
M src/kudu/integration-tests/tablet_replacement-itest.cc
7 files changed, 111 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/5367/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4e69794502d31dc43cc2eba6870c01c828b14da3
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 


[kudu-CR] KUDU-1753 [delete table-test] deleted-while-scanned test

2016-12-05 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5345

to look at the new patch set (#6).

Change subject: KUDU-1753 [delete_table-test] deleted-while-scanned test
..

KUDU-1753 [delete_table-test] deleted-while-scanned test

Added an integration test to ensure a tablet server keeps all the
necessary data around until a scan operation ends even if tablet
is being deleted concurrently.

The test is disabled as it currently fails. A follow up patch will fix
the bug and enable the test.

This is in the context of the following JIRA item:
KUDU-1753 Impala query fails: Unable to advance iterator:
  Illegal state: Tablet is not running

Change-Id: I983e68862b5535f2f95eedd41850a8a88e95e69c
---
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.h
3 files changed, 134 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/5345/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5345
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I983e68862b5535f2f95eedd41850a8a88e95e69c
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] util: add file cache

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: util: add file cache
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

Line 177: ScopedOpenedDescriptor 
BaseDescriptor::InsertIntoCache(
Any reason not to put this method and the one below inline with the class?


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [i-tests] separated implementation of the ext mini-cluster

2016-12-05 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded a new patch set (#2).

Change subject: [i-tests] separated implementation of the ext mini-cluster
..

[i-tests] separated implementation of the ext mini-cluster

Separated implementation of the external mini-cluster methods into a
separate .cc file: those methods contain too much code to be candidates
for inlining.  Besides, this helps to have less header-wise dependencies
and avoid clang-tidy warnings like

  warning: function 'StopCluster' defined in a header file; function
definitions in header files can lead to ODR violations
[misc-definitions-in-headers]

This patch does not contain any functional changes.

Change-Id: I4e69794502d31dc43cc2eba6870c01c828b14da3
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/create-table-itest.cc
M src/kudu/integration-tests/delete_table-test.cc
A src/kudu/integration-tests/external_mini_cluster-itest-base.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.h
M src/kudu/integration-tests/tablet_replacement-itest.cc
7 files changed, 111 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/5367/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4e69794502d31dc43cc2eba6870c01c828b14da3
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] WIP: KUDU-798 (part 5) Safe time advancement in the absense of writes

2016-12-05 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: WIP: KUDU-798 (part 5) Safe time advancement in the absense of 
writes
..


Patch Set 21:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/5240/21//COMMIT_MSG
Commit Message:

Line 7: WIP: KUDU-798 (part 5) Safe time advancement in the absense of writes
> nit: absence
Done


PS21, Line 19: if there aren't any writes in-flight that
 :   it doesn't know about yet
> I think this merits further explanation
this is more of a time manager detail. explained it further there.


PS21, Line 22: Mvcc's safe time is a more conservative version of "global"
 :   safe time
> should we consider renaming this to 'trimmable time' or 'safetime_lower_bou
not sure, with leader leases it can actually be the same as the time manager's 
safe time..


http://gerrit.cloudera.org:8080/#/c/5240/21/java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java:

PS21, Line 58: cluster 
> nit: comma here (http://grammar.ccc.commnet.edu/GRAMMAR/commas_intro.htm ha
thanks. done


PS21, Line 59: se
> absence
Done


PS21, Line 59: we expect precise clock values in the test
> not understanding this exactly. What does one thing have to do with the oth
Done


PS21, Line 61: --disable_
> I think 'negative' gflags are kind of confusing, because you get these kind
Done


http://gerrit.cloudera.org:8080/#/c/5240/21/src/kudu/consensus/consensus.h
File src/kudu/consensus/consensus.h:

Line 28: #include "kudu/consensus/time_manager.h"
> can you forward decl?
Done


http://gerrit.cloudera.org:8080/#/c/5240/21/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

Line 78: scoped_refptr time_manager(new TimeManager(clock, 
Timestamp::kMin));
> missing include for this? (I guess it's getting it transitively)
Done


http://gerrit.cloudera.org:8080/#/c/5240/21/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 105:const scoped_refptr& 
time_manager,
> pass by value and then use std::move below
Done


Line 275: 
RETURN_NOT_OK(time_manager_->AdvanceSafeTimeWithMessage(*msg->get()));
> above comment should be updated to reference this.
Done. It was not possible before, but with recent changes to the TimeManager it 
now is. Also we should only do this in leader mode (until we have leader leases 
we only advance safe time in replicas on commit to reduce opportunity for it to 
move back) so added that condition


PS21, Line 442:  
> comma
Done


PS21, Line 443: else {
  : if
> collapse
Done


http://gerrit.cloudera.org:8080/#/c/5240/21/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

Line 34: #include "kudu/consensus/time_manager.h"
> is forward decl sufficient?
Done


http://gerrit.cloudera.org:8080/#/c/5240/21/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 533:   RETURN_NOT_OK(time_manager_->AssignTimestamp(replicate));
> PREPEND
I changed this to a check ok. in this case we just changed the 
queue/time_manager to leader mode under the consensus lock so this shouldn't 
fail


PS21, Line 1198: *(*iter)->get()
> is **iter sufficient?
good point. I changed this to a CHECK_OK. Currently it only fails if we can't 
update the clock (in which case we should crash). If this changes in the future 
we'll have to update this. Left a TODO for myself.


PS21, Line 1202: d 
> comma
Done


Line 1205:   
RETURN_NOT_OK(time_manager_->AdvanceSafeTime(Timestamp(request->safe_timestamp(;
> same, is this safe to return or should it be a CHECK?
this returns void now and does a check internally.


http://gerrit.cloudera.org:8080/#/c/5240/21/src/kudu/consensus/raft_consensus_state.h
File src/kudu/consensus/raft_consensus_state.h:

Line 33: #include "kudu/consensus/time_manager.h"
> fwd decl?
Done


PS21, Line 256: explicit
> dont need explicit
Done


http://gerrit.cloudera.org:8080/#/c/5240/21/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS21, Line 1195: the clock values are unexpected.
> not quite following this
The above is required  because, in the tests that call this, we're using 
logical values for the timestamps.

This is not required anymore.


http://gerrit.cloudera.org:8080/#/c/5240/21/src/kudu/tablet/mvcc-test.cc
File src/kudu/tablet/mvcc-test.cc:

Line 508:   // Commit tx 2 - thread should still wai.
> typo
Done


Line 510:   mgr.CommitTransaction(tx2);
> probably want an ASSERT_FALSE(HasResultSnapshot()) here.
Done


http://gerrit.cloudera.org:8080/#/c/5240/21/src/kudu/tablet/mvcc.cc
File src/kudu/tablet/mvcc.cc:

Line 285: bool MvccManager::AnyInFlightAtOrBeforeUnlocked(Timestamp ts) const {
> hrm, could this just be implemented by looking at

[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-766: limit number of glog files
..


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Update partition syntax for single value range partitions

2016-12-05 Thread Dan Burkert (Code Review)
Hello Jean-Daniel Cryans, Matthew Jacobs, Dimitris Tsirogiannis,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/5368

to review the following change.

Change subject: Update partition syntax for single value range partitions
..

Update partition syntax for single value range partitions

Impala is going to use the singular form of 'value' for single-valued
range partitions.

Change-Id: I628a4ca81456ca22519e8b7cc5e176a7362a2669
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Partition.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java
M src/kudu/common/partition.cc
3 files changed, 5 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/5368/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I628a4ca81456ca22519e8b7cc5e176a7362a2669
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Matthew Jacobs 


[kudu-CR] KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

2016-12-05 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-798 (part 4) Add a TimeManager to manage safe time 
advancement
..


Patch Set 15:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/5300/15/src/kudu/consensus/time_manager-test.cc
File src/kudu/consensus/time_manager-test.cc:

PS15, Line 39: CHECK_OK
> ASSERT_OK() ?
this is an invariant, not something we're testing. I stand by the CHECK_OK here.


PS15, Line 47: InitTimeManager
> Does this need to be public?  Consider to make it protected.
Done


PS15, Line 72: TEST_F(TimeManagerTest, TestTimeManagerNonLeaderMode)
> It would be nice to have a concise description for the test.
I think this test is pretty straightforward but I did leave a title in there.


PS15, Line 136: TEST_F(TimeManagerTest, TestTimeManagerLeaderMode)
> It would be nice to have a brief description of the test.
Done


PS15, Line 194: after_latch->Wait();
> Is there anything to check after the Wait() call completes?
no, just that it does (i.e. that it doesn't hang)


http://gerrit.cloudera.org:8080/#/c/5300/15/src/kudu/consensus/time_manager.cc
File src/kudu/consensus/time_manager.cc:

Line 17: 
> Consider adding
Done


PS15, Line 25: disable_safe_time_advancement_without_writes
> It looks like a very long name.  Is it possible to replace with
todd had the same suggestion on the next patch. done


PS15, Line 30: raft_heartbeat_interval_ms
> Is it needed here?  I could not find where it's used in this file.
Done


PS15, Line 46: external_consistency_mode
> The external consistency mode is an optional field in WriteRequestPB messag
it has a default though. doesn't that mean it's always set?


PS15, Line 144: waiters_.push_back
> Consider adding a scope cleanup to make sure the waiters_ container is clea
not sure how that would happen. If we return on 143 we never added it. if we 
return on 148 AdvanceSafeTimeAndWakeUpWaitersUnlocked removed it. after that we 
clean it up ourserves unless the latch had been triggered in which case again, 
AdvanceSafeTimeAndWakeUpWaitersUnlocked will clean it.


PS15, Line 148: if (waiter.latch->WaitUntil(deadline))
> Is it necessary to remove the waiter from the container?
see my comment above and AdvanceSafeTimeAndWakeUpWaitersUnlocked.


PS15, Line 154: if (waiter.latch->count() == 0) return Status::OK();
> Is it necessary to clean up the waiters_ container, removing the waiter?
no, AdvanceSafeTimeAndWakeUpWaitersUnlocked will do it.


http://gerrit.cloudera.org:8080/#/c/5300/15/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

PS15, Line 112: higher
> less?
oops, thanks. Done


PS15, Line 126: GetSafeTime
> nit: consider adding 'const' modifier
its not const though (it actually updates the safe time)


-- 
To view, visit http://gerrit.cloudera.org:8080/5300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

2016-12-05 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5300

to look at the new patch set (#16).

Change subject: KUDU-798 (part 4) Add a TimeManager to manage safe time 
advancement
..

KUDU-798 (part 4) Add a TimeManager to manage safe time advancement

This patch adds an entity to manage safe time (the timestamp before
which all transactions are committed/aborted or in-flight) across
a consensus configuration. This allows scans to make sure/wait that
the scan timestamp is safe, thus making sure the scan is repeatable.

This adds a small unit test. Follow up patches will use this
in integration tests.

Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
---
M src/kudu/consensus/CMakeLists.txt
A src/kudu/consensus/time_manager-test.cc
A src/kudu/consensus/time_manager.cc
A src/kudu/consensus/time_manager.h
4 files changed, 635 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/5300/16
-- 
To view, visit http://gerrit.cloudera.org:8080/5300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0bb2b1d2590ed7ead6f1980f9572be10444bb81b
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-798 (part 5) Safe time advancement in the absence of writes

2016-12-05 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5240

to look at the new patch set (#22).

Change subject: WIP: KUDU-798 (part 5) Safe time advancement in the absence of 
writes
..

WIP: KUDU-798 (part 5) Safe time advancement in the absence of writes

This plugs the new TimeManager through consensus and wherever else
it's needed and integrates with with transactions, tablet service
and consensus.

Noteworthy:

- Safe time advances as before _except_ in the absense of writes.
  That is safe time is still advanced in Mvcc by the transaction
  driver and only up to the committed transaction's timestamp.
  When there are no writes to send the leader sends its last
  safe time to the replica, which updates itself and unblocks
  waiters.

- Mvcc's safe time is a more conservative version of "global"
  safe time. Mvcc does not get updated with safe time heartbeats
  from the leader. The clean snapshot must stay "clean" otherwise
  a lot of things would break.

- Mvcc's WaitForCleanSnapshot() method was removed. There is no
  longer a guarantee that the "clean" timestamp will move to
  the present. It might not, in the absense of writes. This
  patch introduces a new method that allows to wait for all
  "known" transactions before a timestamp to be committed.

WIP workign on a thorough integration test to be merged with
this.

This should be good to be reviewed.

Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java
M src/kudu/client/client-test.cc
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/alter_table-test.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/mvcc-test.cc
M src/kudu/tablet/mvcc.cc
M src/kudu/tablet/mvcc.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
M src/kudu/tablet/tablet_peer_mm_ops.cc
M src/kudu/tablet/transactions/alter_schema_transaction.cc
M src/kudu/tablet/transactions/transaction_driver.cc
M src/kudu/tablet/transactions/transaction_driver.h
M src/kudu/tablet/transactions/transaction_tracker-test.cc
M src/kudu/tablet/transactions/transaction_tracker.cc
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tablet/transactions/write_transaction.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
32 files changed, 214 insertions(+), 148 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/5240/22
-- 
To view, visit http://gerrit.cloudera.org:8080/5240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e
Gerrit-PatchSet: 22
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Add file globbing to Env

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has submitted this change and it was merged.

Change subject: Add file globbing to Env
..


Add file globbing to Env

Change-Id: Ic71160b11b19811ac9669dc7a572a7e4d2e59e79
Reviewed-on: http://gerrit.cloudera.org:8080/5338
Reviewed-by: Adar Dembo 
Tested-by: Kudu Jenkins
---
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
3 files changed, 57 insertions(+), 0 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic71160b11b19811ac9669dc7a572a7e4d2e59e79
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Add file modified time to Env

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has submitted this change and it was merged.

Change subject: Add file modified time to Env
..


Add file modified time to Env

Change-Id: I90b3e759a4dbb352a0c09dc726b428c9dcea5595
Reviewed-on: http://gerrit.cloudera.org:8080/5339
Reviewed-by: Adar Dembo 
Tested-by: Kudu Jenkins
---
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
4 files changed, 46 insertions(+), 8 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5339
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I90b3e759a4dbb352a0c09dc726b428c9dcea5595
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] util: add file cache

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: util: add file cache
..


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-stress-test.cc
File src/kudu/util/file_cache-stress-test.cc:

Line 133: deque> files;
> It took me a bit to figure out the difference between this and the global p
Done


http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-test-util.h
File src/kudu/util/file_cache-test-util.h:

Line 68:   CHECK_LE(open_fd_count, max_fd_count);
> This should include an error message, since the LOG_EVERY_N may not have fi
We'll already get one for free that'll include both open_fd_count and 
max_fd_count. That's the entirety of the KLOG_EVERY().


http://gerrit.cloudera.org:8080/#/c/5146/6/src/kudu/util/file_cache-test.cc
File src/kudu/util/file_cache-test.cc:

Line 72:   void AssertFdsAndDescriptors(int num_expected_fds,
> at every call site you are offsetting by this->initial_open_fds_, maybe the
Done


Line 101:   {
> Is this scope necessary?
Done


http://gerrit.cloudera.org:8080/#/c/5146/8/src/kudu/util/file_cache.cc
File src/kudu/util/file_cache.cc:

Line 177: ScopedOpenedDescriptor 
BaseDescriptor::InsertIntoCache(
> Any reason not to put this method and the one below inline with the class?
It's because they return ScopedOpenedDescriptors. That means we need to observe 
the following order:
1) Forward declare ScopedOpenedDescriptor.
2) Declare BaseDescriptor, which declares these two methods.
3) Declare ScopedOpenedDescriptor. We also define it inline, but that's 
orthogonal to this problem.
4) Define these BaseDescriptor methods, which we can do now that 
ScopedOpenedDescriptor has been declared.

...I wrote all that, then I tried inlining it anyway. It worked, provided 
ScopedOpenedDescriptor was forward declared. I guess I don't know how compilers 
work. :/


-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Dan Burkert (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5340

to look at the new patch set (#7).

Change subject: KUDU-766: limit number of glog files
..

KUDU-766: limit number of glog files

This commit introduces a new flag, 'max_log_files', that limits the
maximum number of rolled glog files to keep around at each severity
level. This flag matches an Apache Impala flag of the same name, and the
implementation is partly borrowed from it.

I initially went the route of making a new maintenance manager operation
type to schedule the file cleanup, but it proved difficult to shoehorn
this usecase into that abstraction.  Instead, a thead is spawned on
startup that checks for excess files every 60 seconds.

Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
A src/kudu/integration-tests/log-rolling-itest.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/metrics.h
9 files changed, 290 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/5340/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] util: add file cache

2016-12-05 Thread Adar Dembo (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5146

to look at the new patch set (#9).

Change subject: util: add file cache
..

util: add file cache

This commit introduces a new kind of cache: the file cache. Its purpose is
to enforce an upper bound on the process' open file descriptors with as
little overhead as possible.

During implementation, I followed these design principles:
1. Reuse the existing LRU cache implementation in cache.{cc,h} so that the
   file cache can take advantage of any future improvements there.
2. Preserve existing encapsulation as much as possible. For example, reuse
   the various open file interfaces defined in util/env.h instead of blowing
   them wide open to get at the underlying fds.
3. Even though the primary use case for the file cache is the log block
   manager (to allow for many small containers on el6), the implementation
   should be generic enough that it can be used in any block manager, or
   anywhere else in a Kudu server process.

In particular, I went back and forth between templated and non-templated
approaches. I settled on using templates once Dan helped me through some of
the thornier issues. This approach drops the capability of opening the same
file multiple "ways" in exchange for far less complexity around deferred
deletion. Going down the template route also means the file cache probably
can't be used outside the block managers.

For more design and implementation notes, see the big block comment in
util/file_cache.h.

Other goodies in this commit:
- A correctness test for the file cache.
- A multi-threaded stress test for the file cache.
- A "checker" (in the style of PeriodicWebUIChecker) for verifying that the
  number of open file handles is below what we expect it to be.
- A Cache::Handle deleter, for use with std::unique_ptr.
- A gflag for overriding the number of shards used in ShardedLRUCache, used
  to simplify capacity-based tests.

Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/cache.cc
M src/kudu/util/cache.h
A src/kudu/util/file_cache-stress-test.cc
A src/kudu/util/file_cache-test-util.h
A src/kudu/util/file_cache-test.cc
A src/kudu/util/file_cache.cc
A src/kudu/util/file_cache.h
M src/kudu/util/test_util.cc
M src/kudu/util/test_util.h
10 files changed, 1,506 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/5146/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-766: limit number of glog files
..


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] util: add file cache

2016-12-05 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: util: add file cache
..


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I26d02f71b0a9644de0b669875941adae5f426345
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1622. result tracker: respond to RPCs outside of the lock

2016-12-05 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1622. result_tracker: respond to RPCs outside of the lock
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5359
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I065b9e40bc1af81e0871220a0a01461ea35143b5
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No


[kudu-CR] KUDU-1622. result tracker: respond to RPCs outside of the lock

2016-12-05 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-1622. result_tracker: respond to RPCs outside of the lock
..


KUDU-1622. result_tracker: respond to RPCs outside of the lock

This patch started with the goal of addressing the flakiness of
ExactlyOnceRpcTest.TestExactlyOnceSemanticsAfterRpcCompleted. The issue
was that the MemTracker updates resulting from an RPC response were
occurring asynchronously after the RPC was responded, and in two
separate increments. Thus, in this test, there was one scenario in which
the test would record a partial update and then in a later assertion not
match the earlier recorded value.

In order to fix this, I ended up at a solution which both fixed the
flakiness and reduced lock contention in ResultTracker. We now respond
to RPCs after we've done the required MemTracker updates, which also
means that we respond to RPCs after dropping the ResultTracker locks.
This has been a high contention point in stress tests, so this is a nice
benefit.

Fixing the flakiness also meant that other 'AssertEventually'
workarounds from this test case could now be removed.

I looped ExactlyOnceRpcTest.TestExactlyOnceSemanticsAfterRpcCompleted
500 times with 4 stress threads and it passed 100% of the time[1]. Prior
to this patch, it would reliably fail at least a couple runs out of
500[2].

I also looped exactly_once_writes-itest 500 times with this patch
with 100% success[3].

[1] http://dist-test.cloudera.org//job?job_id=todd.1480928777.29435
[2] http://dist-test.cloudera.org//job?job_id=todd.1480929041.29742
[3] http://dist-test.cloudera.org//job?job_id=todd.1480929367.31010

Change-Id: I065b9e40bc1af81e0871220a0a01461ea35143b5
Reviewed-on: http://gerrit.cloudera.org:8080/5359
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves 
---
M src/kudu/rpc/exactly_once_rpc-test.cc
M src/kudu/rpc/result_tracker.cc
2 files changed, 93 insertions(+), 84 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5359
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I065b9e40bc1af81e0871220a0a01461ea35143b5
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-1622. Try to move heavy work out of ResultTracker lock

2016-12-05 Thread Todd Lipcon (Code Review)
Todd Lipcon has abandoned this change.

Change subject: WIP: KUDU-1622. Try to move heavy work out of ResultTracker lock
..


Abandoned

ended up doing this here: https://gerrit.cloudera.org/#/c/5359/

-- 
To view, visit http://gerrit.cloudera.org:8080/5323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ide22d8ca632672111296cc210c212b87c33d6a0a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-766: limit number of glog files

2016-12-05 Thread Dan Burkert (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5340

to look at the new patch set (#8).

Change subject: KUDU-766: limit number of glog files
..

KUDU-766: limit number of glog files

This commit introduces a new flag, 'max_log_files', that limits the
maximum number of rolled glog files to keep around at each severity
level. This flag matches an Apache Impala flag of the same name, and the
implementation is partly borrowed from it.

I initially went the route of making a new maintenance manager operation
type to schedule the file cleanup, but it proved difficult to shoehorn
this usecase into that abstraction.  Instead, a thead is spawned on
startup that checks for excess files every 60 seconds.

Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
A src/kudu/integration-tests/log-rolling-itest.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/metrics.h
9 files changed, 289 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/5340/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia3f81638a6dfcf134665322f091ff26d29640fd9
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


  1   2   >