[kudu-CR] [authn token expire-itest] minor clean-up

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded a new patch set (#2).

Change subject: [authn_token_expire-itest] minor clean-up
..

[authn_token_expire-itest] minor clean-up

This patch contains a couple of updates on authn_token_expire-itest:

  * Increase authn token's TTL from 2 to 5 seconds in workload-related
test scenario to make it more stable when running with other
parallel activity.

  * Do not call ExternalMiniCluster::Start() for slow scenarios
to avoid wasting time if KUDU_ALLOW_SLOW_TESTS is not enabled.

  * More relevant names for tests classes.

Change-Id: I437f2f72762c3a6e037792a511259734c8b0d860
---
M src/kudu/integration-tests/authn_token_expire-itest.cc
1 file changed, 35 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/6987/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I437f2f72762c3a6e037792a511259734c8b0d860
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] [authn token expire-itest] minor clean-up

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6987

Change subject: [authn_token_expire-itest] minor clean-up
..

[authn_token_expire-itest] minor clean-up

This patch contains a couple of updates on authn_token_expire-itest:

  * Increase authn token's TTL from 2 to 5 seconds in workload-related
test scenario to make it more stable when running with other
parallel activity.

  * Do not call ExternalMiniCluster::Start() for slow scenarios to
to avoid wasting time if KUDU_ALLOW_SLOW_TESTS is not enabled.

  * More relevant names for tests classes.

Change-Id: I437f2f72762c3a6e037792a511259734c8b0d860
---
M src/kudu/integration-tests/authn_token_expire-itest.cc
1 file changed, 35 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/6987/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I437f2f72762c3a6e037792a511259734c8b0d860
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 


[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
..


Patch Set 8: Verified+1

Flakes:

1. Port conflict while running 
ClientFailoverOnNegotiationTimeoutITest.Kudu2021NegotiateWithMaster

2. Absolutely unrelated python symbols error.

-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1911 add more helpful error message when required tool arguments are missing

2017-05-24 Thread Sam Okrent (Code Review)
Sam Okrent has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6986

Change subject: KUDU-1911 add more helpful error message when required tool 
arguments are missing
..

KUDU-1911 add more helpful error message when required tool arguments are 
missing

When using a tool that has a required positional argument, excluding that
argument or passing it as a flag caused a confusing error message to be
printed out ("must provide missing_argument"). This commit changes
the message to "must provide positional argument missing_argument".

Change-Id: I2128a01da5c4929981b2ea46a32b0f7b9dd066e1
---
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_main.cc
2 files changed, 18 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/86/6986/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2128a01da5c4929981b2ea46a32b0f7b9dd066e1
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sam Okrent 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-24 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS35, Line 137: Status
> Templating and the closure made this a bit tricky. For now I'm leaving it b
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
..


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6927/6//COMMIT_MSG
Commit Message:

Line 7: KUDU-2021 retry master RPC if negotiation times out
> How about:
That looks better!

I would shorted it a bit, though:

KUDU-2021 test for negotiation timeout on Master RPCs


http://gerrit.cloudera.org:8080/#/c/6927/6/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

Line 484:   static const int kTimeoutMs = 1 * 60 * 1000;
> nit: how about const MonoDelta kTimeout = MonoDelta::FromSeconds(60)
yep, that's much better.


Line 488:   cluster_opts_.master_rpc_ports = { 37030, 37031, 37032 };
> huh. do we have to pick these manually?
Yep, that's the ugly piece of all multi-master tests.  The crux is that all 
masters should know other master's end-points  when starting up.

The fix would be: make probing on ports, somehow reserve them, and then start 
the masters with the reserved ports.


Line 540:   ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms", 
"500"));
> It's not obvious to me why a 500ms negotiation delay is sufficient to cause
It seems this went this state after some iterations.

It supposed to be something like the following:

  ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms",
  Substitute("$0", 
FLAGS_rpc_negotiation_timeout_ms)));
  thread pause_thread([&]() {
  SleepFor(MonoDelta::FromMilliseconds(FLAGS_rpc_negotiation_timeout_ms / 
2));
  CHECK_OK(m->Pause())
});


Line 542:   SleepFor(MonoDelta::FromSeconds(3));
> What's the purpose of this sleep? Is it so this executes after ListTables()
Yep, it seem that was a typo.  I added corresponding comment into the code 
regarding purpose of this sleep.  Basically, we want the former leader master 
to lose its leadership role, but only after the client has connected and 
started connection negotiation.


Line 548:   ScopedResumeExternalDaemon resumer(m);
> CHECK_OK(m->Resume()); ?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

2017-05-24 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6927

to look at the new patch set (#8).

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
..

KUDU-2021 test for negotiation timeout on Master RPCs

This patch adds integration tests to verify that the Kudu C++ client
behaves as described in KUDU-2021: in case of a multi-master Kudu
cluster, the client retries an RPC with other master if the connection
negotiation with leader master times out.

Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
---
M src/kudu/integration-tests/client_failover-itest.cc
1 file changed, 96 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6927/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-24 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#38).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/pb_util.proto
39 files changed, 1,034 insertions(+), 200 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/38
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 38
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-24 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#37).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/pb_util.proto
39 files changed, 1,035 insertions(+), 200 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/37
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 37
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged.

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..


KUDU-1580 retry tserver RPC if negotiation times out

This patch addresses KUDU-1580, i.e. with this patch the Kudu C++ client
retries an RPC with other tablet replica if the connection negotiation
with current replica timed out.

Added new integration test to cover the updated client's behavior.

Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Reviewed-on: http://gerrit.cloudera.org:8080/6926
Reviewed-by: Mike Percy 
Tested-by: Kudu Jenkins
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/rpc/connection.cc
M src/kudu/rpc/outbound_call.cc
M src/kudu/rpc/outbound_call.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/rpc-test-base.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/rpc_controller.cc
M src/kudu/rpc/rpc_controller.h
M src/kudu/rpc/rpc_introspection.proto
12 files changed, 265 insertions(+), 39 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] consensus: consolidate Raft thread pools

2017-05-24 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: consensus: consolidate Raft thread pools
..


Patch Set 4:

I ran raft_consensus-itest in slow mode 1000 times. All of them passed.

-- 
To view, visit http://gerrit.cloudera.org:8080/6946
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-24 Thread Andrew Wong (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6636

to look at the new patch set (#36).

Change subject: KUDU-1952 Remove round-robin for block placement
..

KUDU-1952 Remove round-robin for block placement

This is the first of a multi-patch patchset to mitigate the effects of
single-disk failure. Throughout the code, the term "DataDir" refers to
a data directory, which is often mounted on a distinct disk. Thus,
"disks" and "data directories" will be used interchangeably.

This patch adds a mapping from tablet to a set of disks and uses it to
replace the existing round-robin placement of blocks. Tablets are
mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks
are placed randomly in directories within each tablet's DataDirGroup.

Tablet-to-group mappings are generated and stored as metadata upon
tablet creation, or upon tablet replacement during a tablet copy.
During group creation, disks are added to groups by randomly selecting
two available directories and selecting the one with fewer tablets on
it. This avoids pigeonholing new tablets to disks with relatively few
tablets, while still trending towards filling underloaded disks.

Groups are maintained when restarting the server, as they are flushed
with metadata, and are deleted upon tablet deletion.  When loading
tablet data from a previous version of Kudu, the tablet's metadata
will not have a DataDirGroup. One will be generated containing all
data directories, as the tablet's data may already be spread across
any number of disks.

As this patch only addresses block placement, it does not itself
mitigate the effects of single-disk failure. Given this, and given the
tradeoff between I/O and disk-failure tolerance, the default behavior
will be to spread tablet data across all disks.

Testing is done at the block manager level in block_manager-test and
log_block_manager-test, as well as in the new data_dirs-test.

A design doc can be found here:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
---
M src/kudu/cfile/bloomfile-test-base.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
A src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/delta_compaction-test.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/pb_util.proto
39 files changed, 1,040 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/36
-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 36
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] server: move apply pool into ServerBase

2017-05-24 Thread Adar Dembo (Code Review)
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6984

to look at the new patch set (#2).

Change subject: server: move apply_pool into ServerBase
..

server: move apply_pool into ServerBase

The server-wide apply_pool was instantiated in two places: SysCatalogTable
(for the master) and TsTabletManager (for the tserver). This commit moves it
into ServerBase and unifies the instantiation. Note: the apply pool
semantics have not changed.

Some interesting side effects:
1. The master will now generate apply pool metrics.
2. The apply pool is shut down a little bit later in server shutdown than it
   was before, though I don't see any issues with this.

Change-Id: Ie7ffc886269aa6531517a52fef29c4408a925aed
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
8 files changed, 48 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/6984/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7ffc886269aa6531517a52fef29c4408a925aed
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] threadpool: token-based task sequencing

2017-05-24 Thread Adar Dembo (Code Review)
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6874

to look at the new patch set (#4).

Change subject: threadpool: token-based task sequencing
..

threadpool: token-based task sequencing

This patch adds task sequencing to util/threadpool. Task sequencing allows
m contexts to share a single pool with n threads while also ensuring that
the pool executes tasks belonging to each context in order. Previously this
was only achievable via "one singleton pool per context", which grossly
inflated the total number of threads and overall state.

A group of tasks are sequenced by submission to a dedicated "slot". A client
obtains exclusive access to a slot via AllocateTokenSlot(), which returns a
"token" that can be used for task submission/waiting. When the token is
destroyed (or Release() is called), the slot is returned to the pool. For
implementation simplicity, clients must wait for a token's outstanding tasks
to complete before destroying their token.

Some notes:
- Slots and tokens are mapped 1-1 so they could theoretically be combined,
  but I prefer this separation of concerns.
- The current implementation requires tokens to have no outstanding tasks
  when they are released. This was a conscious choice made so that token
  destruction (especially when done as part of a larger object graph
  destruct) won't take an unusually long amount of time.
- I evaluated two other implementations. In one, tokens had an implicit
  lifecycle that was automatically managed by the threadpool. While simpler
  for clients, the threadpool was more inefficient with more allocations and
  deallocations in each submission. In the other, token submission was built
  using regular task submission. This afforded a certain separation between
  the "core" of the threadpool and the token logic, but it complicated
  locking and tracking of queued tasks.
- I tried to keep submission (whether token-based or tokenless) fast. Just
  the change from std::list to std::deque for the main queue ought to
  improve performance of tokenless submissions. The next bottleneck is
  likely to be lock contention on the global threadpool lock.

Change-Id: If46dc34212027b6ea5dbc2ead7c7af8be57f2c70
---
M src/kudu/util/debug-util.h
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
4 files changed, 943 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/6874/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6874
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If46dc34212027b6ea5dbc2ead7c7af8be57f2c70
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] threadpool: new test for pools with no max threads

2017-05-24 Thread Adar Dembo (Code Review)
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon, Alexey Serbin, Kudu 
Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6945

to look at the new patch set (#5).

Change subject: threadpool: new test for pools with no max_threads
..

threadpool: new test for pools with no max_threads

This test ensures that a pool created with effectively no max_threads works
as expected. That is:
1. Tokenless tasks should trigger the creation of a new thread.
2. Token-based tasks can create new threads, but only up to the number of
   tokens submitted against.

I intend to use this "feature" to consolidate some Raft thread pools where a
num_cpus upper bound may be undesirable (i.e. where tasks submitted to the
pools may result in blocking IO).

Change-Id: I37d0bd0a05098bcc1a81ec9f1ac33e74e98781e4
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.h
2 files changed, 70 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/6945/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37d0bd0a05098bcc1a81ec9f1ac33e74e98781e4
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] consensus: consolidate Raft thread pools

2017-05-24 Thread Adar Dembo (Code Review)
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6946

to look at the new patch set (#4).

Change subject: consensus: consolidate Raft thread pools
..

consensus: consolidate Raft thread pools

This patch consolidates the two thread pools used in Raft consensus: the
observer and "Raft" (for lack of a better name) pools. The former runs tasks
for infrequent events such as term and config changes while the latter is
used for periodic events such as processing heartbeat responses.

In this patch, each per-replica pool is consolidated into a single
per-server pool. Sequence tokens are used to ensure that tasks belonging to
a single replica are executed serially and in order. For the "Raft" pool, a
single sequence token is shared by the PeerManager and RaftConsensus; this
mirrors the existing sharing behavior and is safe because token operations
are thread-safe.

The non-default max_threads may come as a surprise, but David showed me how
tasks submitted to either pool may sometimes lead to an fsync (mostly via
cmeta flushes). As such, it isn't safe to use the default num_cpus upper
bound, as that may cause such IO-based tasks to block other CPU-only tasks
(or worse, periodic tasks like heartbeat response processing).

What per-replica threads are not addressed by this patch?
- Failure detection thread: a threadpool isn't the right model
  for these. Something like a hash wheel timer would be ideal.
- Prepare thread pool (max_threads=1): these could be consolidated too, but
  their metrics are per-replica, and sequence tokens don't (yet) support
  that. Meaning, if they were consolidated now, the metrics would also
  consolidate and would be less useful as a result.

Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/ts_tablet_manager.cc
19 files changed, 189 insertions(+), 117 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/6946/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6946
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] server: move apply pool into ServerBase

2017-05-24 Thread Adar Dembo (Code Review)
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/6984

to review the following change.

Change subject: server: move apply_pool into ServerBase
..

server: move apply_pool into ServerBase

The server-wide apply_pool was instantiated in two places: SysCatalogTable
(for the master) and TsTabletManager (for the tserver). This commit moves it
into ServerBase and unifies the instantiation. Note: the apply pool
semantics have not changed.

Some interesting side effects:
1. The master will now generate apply pool metrics.
2. The apply pool is shut down a little bit later in server shutdown than it
   was before, though I don't see any issues with this.

Change-Id: Ie7ffc886269aa6531517a52fef29c4408a925aed
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
8 files changed, 48 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/6984/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie7ffc886269aa6531517a52fef29c4408a925aed
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] consensus: consolidate Raft thread pools

2017-05-24 Thread Adar Dembo (Code Review)
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6946

to look at the new patch set (#3).

Change subject: consensus: consolidate Raft thread pools
..

consensus: consolidate Raft thread pools

This patch consolidates the two thread pools used in Raft consensus: the
observer and "Raft" (for lack of a better name) pools. The former runs tasks
for infrequent events such as term and config changes while the latter is
used for periodic events such as processing heartbeat responses.

In this patch, each per-replica pool is consolidated into a single
per-server pool. Sequence tokens are used to ensure that tasks belonging to
a single replica are executed serially and in order. For the "Raft" pool, a
single sequence token is shared by the PeerManager and RaftConsensus; this
mirrors the existing sharing behavior and is safe because token operations
are thread-safe.

The non-default max_threads may come as a surprise, but David showed me how
tasks submitted to either pool may sometimes lead to an fsync (mostly via
cmeta flushes). As such, it isn't safe to use the default num_cpus upper
bound, as that may cause such IO-based tasks to block other CPU-only tasks
(or worse, periodic tasks like heartbeat response processing).

What per-replica threads are not addressed by this patch?
- Failure detection thread: a threadpool isn't the right model
  for these. Something like a hash wheel timer would be ideal.
- Prepare thread pool (max_threads=1): these could be consolidated too, but
  their metrics are per-replica, and sequence tokens don't (yet) support
  that. Meaning, if they were consolidated now, the metrics would also
  consolidate and would be less useful as a result.

Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/ts_tablet_manager.cc
19 files changed, 189 insertions(+), 117 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/6946/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6946
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1034 client does not failover due to timeout
..


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-2021 retry master RPC if negotiation times out

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-2021 retry master RPC if negotiation times out
..


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6927/6//COMMIT_MSG
Commit Message:

Line 7: KUDU-2021 retry master RPC if negotiation times out
How about:

KUDU-2021 Add regression test for negotiation timeout on Master RPCs


http://gerrit.cloudera.org:8080/#/c/6927/6/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

Line 484:   static const int kTimeoutMs = 1 * 60 * 1000;
nit: how about const MonoDelta kTimeout = MonoDelta::FromSeconds(60)


Line 488:   cluster_opts_.master_rpc_ports = { 37030, 37031, 37032 };
huh. do we have to pick these manually?


Line 540:   ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms", 
"500"));
It's not obvious to me why a 500ms negotiation delay is sufficient to cause a 
timeout in this test


Line 542:   SleepFor(MonoDelta::FromSeconds(3));
What's the purpose of this sleep? Is it so this executes after ListTables() is 
executed? Is that required for this test to pass? It seems like we should be 
pausing the LeaderMaster then doing the ListTables immediately.


Line 548:   ScopedResumeExternalDaemon resumer(m);
CHECK_OK(m->Resume()); ?


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6926/7/src/kudu/rpc/outbound_call.cc
File src/kudu/rpc/outbound_call.cc:

Line 331: case NEGOTIATION_TIMED_OUT:   // fall-through
> Hmm it looks like it's only actually used by an internal runtime assertion.
Yep, it seems so.  As I understand it's fine with the change and also we have 
coverage for that by all our tests in DEBUG, ASAN and TSAN configurations.


-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] consensus: consolidate Raft thread pools

2017-05-24 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: consensus: consolidate Raft thread pools
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:

PS2, Line 130: )
> extra parenthesis?
Done


http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 191:   shared_ptr token(new 
SequenceToken(raft_pool->AllocateTokenSlot()));
> I think this should be
Done


http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/raft_consensus_quorum-test.cc
File src/kudu/consensus/raft_consensus_quorum-test.cc:

Line 168:   new SequenceToken(raft_pool_->AllocateTokenSlot()));
> likewise
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6946
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1034 client does not failover due to timeout
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6924/6/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

Line 33: #include "kudu/integration-tests/ts_itest-base.h"
> no longer used? and the one below
Right, this one is not used.  But the line below is needed for 
tserver::ListTabletsResponsePB::StatusAndSchemaPB.  I'll update it, thanks!


Line 288:   SleepFor(MonoDelta::FromMilliseconds(10 * (i + 1)));
> warning: either cast from 'int' to 'int64_t' (aka 'long') is ineffective, o
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Alexey Serbin (Code Review)
Hello Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6924

to look at the new patch set (#7).

Change subject: KUDU-1034 client does not failover due to timeout
..

KUDU-1034 client does not failover due to timeout

This patch fixes the issue described by KUDU-1034: the client does not
mark the failed tablet server as 'failed' in case of timeout and
continues to use it over and over again to send further requests,
even if other tablet replicas might be available.

Besides the actual fix, this patch incorporates an integration test
(ClientFailoverTServerTimeoutITest.FailoverOnLeaderTimeout) written by Mike.

Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
---
M src/kudu/client/meta_cache.h
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/rpc/retriable_rpc.h
4 files changed, 125 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6924/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Alexey Serbin (Code Review)
Hello Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6926

to look at the new patch set (#9).

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..

KUDU-1580 retry tserver RPC if negotiation times out

This patch addresses KUDU-1580, i.e. with this patch the Kudu C++ client
retries an RPC with other tablet replica if the connection negotiation
with current replica timed out.

Added new integration test to cover the updated client's behavior.

Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/rpc/connection.cc
M src/kudu/rpc/outbound_call.cc
M src/kudu/rpc/outbound_call.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/rpc-test-base.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/rpc_controller.cc
M src/kudu/rpc/rpc_controller.h
M src/kudu/rpc/rpc_introspection.proto
12 files changed, 265 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/6926/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..


Patch Set 8: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6926/7/src/kudu/rpc/outbound_call.cc
File src/kudu/rpc/outbound_call.cc:

Line 331: case NEGOTIATION_TIMED_OUT:   // fall-through
> What exactly do we want to assert?
Hmm it looks like it's only actually used by an internal runtime assertion. I 
guess it's fine.


-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] threadpool: token-based task sequencing

2017-05-24 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: threadpool: token-based task sequencing
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6874/3/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

Line 419: TEST_F(ThreadPoolTest, TestTokenSubmitsProcessedSerially) {
> I think this may fail on a single CPU system.
I'm not seeing why; can you elaborate?


http://gerrit.cloudera.org:8080/#/c/6874/3/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

PS3, Line 146: std::move
> I think you want std::forward here.
I can't say I understand the difference between move() and forward() (even 
after reading an article or two about it), but I've changed both of these moves 
to be forwards.


http://gerrit.cloudera.org:8080/#/c/6874/3/src/kudu/util/threadpool.h
File src/kudu/util/threadpool.h:

Line 198:   DISALLOW_COPY_AND_ASSIGN(SequenceToken);
> I was curious about this, so I looked it up in Effective Modern C++ (page 1
I think I will since it adds information for people who aren't familiar with 
that aspect of move operations.


PS3, Line 203: FIFO
> nit: missing "queue"
Done


PS3, Line 220: The execution order will be A1, T1, T2, A2, A3.
> this is a _possible_ execution order, right?
Dan: A2 isn't allowed to run until A1 has finished. You're right though that if 
A1 finishes before A2 is submitted, A2 will run second. I'll clarify that.

I should have clarified that this execution order presupposes that the tasks 
are long-running (i.e. that A3 will be submitted before any task finishes).


-- 
To view, visit http://gerrit.cloudera.org:8080/6874
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If46dc34212027b6ea5dbc2ead7c7af8be57f2c70
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1034 client does not failover due to timeout
..


Patch Set 6: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6924/6/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

Line 33: #include "kudu/integration-tests/ts_itest-base.h"
no longer used? and the one below


-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6924

to look at the new patch set (#6).

Change subject: KUDU-1034 client does not failover due to timeout
..

KUDU-1034 client does not failover due to timeout

This patch fixes the issue described by KUDU-1034: the client does not
mark the failed tablet server as 'failed' in case of timeout and
continues to use it over and over again to send further requests,
even if other tablet replicas might be available.

Besides the actual fix, this patch incorporates an integration test
(ClientFailoverTServerTimeoutITest.FailoverOnLeaderTimeout) written by Mike.

Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
---
M src/kudu/client/meta_cache.h
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/rpc/retriable_rpc.h
4 files changed, 128 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6924/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6926

to look at the new patch set (#8).

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..

KUDU-1580 retry tserver RPC if negotiation times out

This patch addresses KUDU-1580, i.e. with this patch the Kudu C++ client
retries an RPC with other tablet replica if the connection negotiation
with current replica timed out.

Added new integration test to cover the updated client's behavior.

Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/rpc/connection.cc
M src/kudu/rpc/outbound_call.cc
M src/kudu/rpc/outbound_call.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/rpc-test-base.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/rpc_controller.cc
M src/kudu/rpc/rpc_controller.h
M src/kudu/rpc/rpc_introspection.proto
12 files changed, 264 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/6926/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Dan Burkert (Code Review)
Dan Burkert has submitted this change and it was merged.

Change subject: KUDU-1988: add support for advertised host:port info.
..


KUDU-1988: add support for advertised host:port info.

Prior to this change, it was difficult to deploy a Kudu cluster in
containerized environments where each container has a unique, local
subnetwork with an externally facing translation layer. In these
scenarios the network address that the Kudu server binds to is not the
externally-routable IP address. To overcome this, a set of new flags are
introduced, '--rpc-advertised-addresses' and
'--webserver-advertised-addresses' which allow administrators to
override the set of addresses which the Kudu server gives to external
servers and clients. Similar configurations are present in other
distributed systems, for example, Kafka's advertised.host.name property.

Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Reviewed-on: http://gerrit.cloudera.org:8080/6827
Reviewed-by: Adar Dembo 
Tested-by: Kudu Jenkins
---
M src/kudu/master/master.cc
M src/kudu/server/CMakeLists.txt
A src/kudu/server/rpc_server-test.cc
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/webserver-test.cc
M src/kudu/server/webserver.cc
M src/kudu/server/webserver.h
M src/kudu/server/webserver_options.cc
M src/kudu/server/webserver_options.h
M src/kudu/tserver/heartbeater.cc
11 files changed, 333 insertions(+), 13 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1034 client does not failover due to timeout
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h
File src/kudu/rpc/retriable_rpc.h:

PS4, Line 195: MarkServerFailed
> +1 thats ok
https://issues.apache.org/jira/browse/KUDU-2024


-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1988: add support for advertised host:port info.
..


Patch Set 13: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1988: add support for advertised host:port info.
..


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6827/12//COMMIT_MSG
Commit Message:

PS12, Line 10: -
> Nit: don't need
Done


http://gerrit.cloudera.org:8080/#/c/6827/12/src/kudu/server/rpc_server-test.cc
File src/kudu/server/rpc_server-test.cc:

Line 99:   GetAddresses(_addrs, _addrs);
> Should wrap these calls in NO_FATALS.
Done


http://gerrit.cloudera.org:8080/#/c/6827/12/src/kudu/server/webserver-test.cc
File src/kudu/server/webserver-test.cc:

Line 299:   GetAddresses(_addrs, _addrs);
> Need to wrap these in NO_FATALS().
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6827

to look at the new patch set (#13).

Change subject: KUDU-1988: add support for advertised host:port info.
..

KUDU-1988: add support for advertised host:port info.

Prior to this change, it was difficult to deploy a Kudu cluster in
containerized environments where each container has a unique, local
subnetwork with an externally facing translation layer. In these
scenarios the network address that the Kudu server binds to is not the
externally-routable IP address. To overcome this, a set of new flags are
introduced, '--rpc-advertised-addresses' and
'--webserver-advertised-addresses' which allow administrators to
override the set of addresses which the Kudu server gives to external
servers and clients. Similar configurations are present in other
distributed systems, for example, Kafka's advertised.host.name property.

Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
---
M src/kudu/master/master.cc
M src/kudu/server/CMakeLists.txt
A src/kudu/server/rpc_server-test.cc
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/webserver-test.cc
M src/kudu/server/webserver.cc
M src/kudu/server/webserver.h
M src/kudu/server/webserver_options.cc
M src/kudu/server/webserver_options.h
M src/kudu/tserver/heartbeater.cc
11 files changed, 333 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6827/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1988: add support for advertised host:port info.
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6827/12//COMMIT_MSG
Commit Message:

PS12, Line 10: 
Nit: don't need


http://gerrit.cloudera.org:8080/#/c/6827/12/src/kudu/server/rpc_server-test.cc
File src/kudu/server/rpc_server-test.cc:

Line 99: TEST_F(BoundOnlyWebserverTest, OnlyBoundAddresses) {
Should wrap these calls in NO_FATALS.


http://gerrit.cloudera.org:8080/#/c/6827/12/src/kudu/server/webserver-test.cc
File src/kudu/server/webserver-test.cc:

Line 299:   ASSERT_OK(server_->GetAdvertisedAddresses(_addrs));
Need to wrap these in NO_FATALS().


-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6827

to look at the new patch set (#12).

Change subject: KUDU-1988: add support for advertised host:port info.
..

KUDU-1988: add support for advertised host:port info.

Prior to this change, it was difficult to deploy a Kudu cluster in
containerized-environments where each container has a unique, local
subnetwork with an externally facing translation layer. In these
scenarios the network address that the Kudu server binds to is not the
externally-routable IP address. To overcome this, a set of new flags are
introduced, '--rpc-advertised-addresses' and
'--webserver-advertised-addresses' which allow administrators to
override the set of addresses which the Kudu server gives to external
servers and clients. Similar configurations are present in other
distributed systems, for example, Kafka's advertised.host.name property.

Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
---
M src/kudu/master/master.cc
M src/kudu/server/CMakeLists.txt
A src/kudu/server/rpc_server-test.cc
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/webserver-test.cc
M src/kudu/server/webserver.cc
M src/kudu/server/webserver.h
M src/kudu/server/webserver_options.cc
M src/kudu/server/webserver_options.h
M src/kudu/tserver/heartbeater.cc
11 files changed, 333 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6827/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6926/7/src/kudu/rpc/outbound_call.cc
File src/kudu/rpc/outbound_call.cc:

Line 331: case NEGOTIATION_TIMED_OUT:   // fall-through
> It is possible to assert on this in a test somewhere?
What exactly do we want to assert?


-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6926/7/src/kudu/rpc/outbound_call.cc
File src/kudu/rpc/outbound_call.cc:

Line 331: case NEGOTIATION_TIMED_OUT:   // fall-through
It is possible to assert on this in a test somewhere?


http://gerrit.cloudera.org:8080/#/c/6926/6/src/kudu/rpc/rpc_introspection.proto
File src/kudu/rpc/rpc_introspection.proto:

Line 43: NEGOTIATION_TIMED_OUT = 7;
> I suspect the idea was to limit dependencies between components and have mo
Hmm ok. I suspect there is no great reason TBH but we can defer on that


-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has submitted this change and it was merged.

Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but 
still in config
..


KUDU-1860: ksck doesn't identify tablets that are evicted but still in config

This patch enhances ksck to gather consensus info from every
tablet. It compares this info with master and outputs the
master's config and every conflicting config, if there are any
conflicts. To do this efficiently it reimplements the
GetAllConsensusState RPC so that it gathers info about every
replica's consensus state.

This will catch at least the two problems identified in
KUDU-1860: 1. The leader has a pending config to remove a
tablet, but it is not committed so the master does not see this
config. This can hide an unhealthy tablet if, e.g., one pending
config member is down and the pending-to-be-kicked-out member is
up, so 1/2 replicas are alive in the leader's active config but
the master thinks 2/3 are alive. 2. No replica is leader but the
master believes there is a leader because its cache is old and
hasn't been updated.

Sample output showing #1:
https://gist.github.com/wdberkeley/d2606698e4f2e8ca3ef70d4dcef7ba9a

Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808
Reviewed-on: http://gerrit.cloudera.org:8080/6772
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy 
---
M src/kudu/consensus/consensus.proto
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tserver/tablet_replica_lookup.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
M src/kudu/tserver/ts_tablet_manager.h
14 files changed, 521 insertions(+), 71 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1034 client does not failover due to timeout
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h
File src/kudu/rpc/retriable_rpc.h:

PS4, Line 195: MarkServerFailed
> I think we can address that separately -- my current priority to have the b
+1 thats ok


-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1988: add support for advertised host:port info.
..


Patch Set 10:

(35 comments)

http://gerrit.cloudera.org:8080/#/c/6827/10//COMMIT_MSG
Commit Message:

Line 7: KUDU-1988: add support for advertised host:port info.
> Can you add more detail info here?
Done


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/master/master.cc
File src/kudu/master/master.cc:

PS10, Line 241: web_server()->GetAdvertisedAddresses
> Does it make sense to check for return code in this  case?
Done


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/CMakeLists.txt
File src/kudu/server/CMakeLists.txt:

Line 110: ADD_KUDU_TEST(rpc_server-test)
> Nit: resort
Done


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server-test.cc
File src/kudu/server/rpc_server-test.cc:

Line 19: 
> nit: consider adding at least
Done


PS10, Line 30:   RpcServerAdvertisedAddressesTest() {
 :   }
> Nit: can omit
Done


PS10, Line 38: string advertised = use_advertised_addresses();
> nit: consider moving this after that if (!bind.empty()) { ... } closure
Done


Line 42:   opts.rpc_bind_addresses = "127.0.0.1:9898";
> Can we avoid hardcoding ports in the bound addresses? It's going to make th
Done


Line 48: gscoped_ptr metric_registry(new MetricRegistry());
> Nit: should use unique_ptr.
Done


PS10, Line 50:   
> nit: strange shift (supposed to be 4 spaces)
Done


Line 53: std::shared_ptr messenger;
> Add "using std::shared_ptr" and omit the prefix.
Done


PS10, Line 55:   
> nit: strange shift (supposed to be 4 spaces)
Done


PS10, Line 54: builder.set_num_reactors(1)
 :   .set_min_negotiation_threads(1)
 :   .set_max_negotiation_threads(1)
 :   .set_metric_entity(metric_entity)
 :   .enable_inbound_tls();
> Are all of these non-default settings needed for the test? If not, can you 
Done


Line 59: builder.Build();
> ASSERT_OK on this.
Done


PS10, Line 60: CHECK_OK(server_->Init(messenger));
 : CHECK_OK(server_->Bind());
> It's OK to use ASSERT_OK() in SetUp, I believe.
Done


PS10, Line 66: const string
> nit: what's use to return const by value?  Maybe, turn this into 'const str
I've dropped the const.


Line 69:   gscoped_ptr server_;
> unique_ptr here too.
Done


PS10, Line 91: advertised_addrs.size(), 1
> here is elsewhere in this file: the expected value should come first, that 
Done


PS10, Line 89:   vector advertised_addrs;
 :   ASSERT_OK(server_->GetAdvertisedAddresses(_addrs));
 :   ASSERT_EQ(advertised_addrs.size(), 1);
 :   vector bound_addrs;
 :   ASSERT_OK(server_->GetBoundAddresses(_addrs));
> Maybe add a test method that fetches the server's advertised and bound addr
Done


PS10, Line 94:   ASSERT_EQ(advertised_addrs[0].host(), "127.0.0.1");
 :   ASSERT_EQ(advertised_addrs[0].port(), );
> nit for here and elsewhere in the test: I think for readability it would be
This has been refactored a bit from Adar's suggestion, so I don't think I can 
do this anymore.


PS10, Line 96: bound_addrs[0]
> does it make sense to add an assert on the size of bound_addrs prior to acc
Done


http://gerrit.cloudera.org:8080/#/c/6827/8/src/kudu/server/rpc_server.cc
File src/kudu/server/rpc_server.cc:

Line 117:   if (!options_.rpc_advertised_addresses.empty()) {
> This should guard against port 0, as on line 111 above.
Done


Line 210: 
> Same comment as in the webserver; this can just use the copy ctor.
Done


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server.cc
File src/kudu/server/rpc_server.cc:

Line 51:   "Comma-separated list of addresses to advertise 
externally for RPC connections. "
> Would be nice to include a sentence or two about how when to use this and n
Done


PS10, Line 45: DEFINE_string(rpc_bind_addresses, "0.0.0.0",
 :   "Comma-separated list of addresses to bind to for 
RPC connections. "
 :   "Currently, ephemeral ports (i.e. port 0) are not 
allowed.");
 : TAG_FLAG(rpc_bind_addresses, stable);
 : 
 : DEFINE_string(rpc_advertised_addresses, "",
 :   "Comma-separated list of addresses to advertise 
externally for RPC connections. "
 :   "Currently, ephemeral ports (i.e. port 0) are not 
allowed.");
 : TAG_FLAG(rpc_advertised_addresses, advanced);
> Should advertised addresses be a subset of bind addresses?  If yes, conside
No, most likely they will be completely separate.


PS10, Line 124: LOG(FATAL) << "Advertising an ephemeral port is not 
supported (RPC advertised address "
  :<< "configured to " << addr.ToString() << 
")";
> We could do this via gflag validation, no? Just look 

[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6827

to look at the new patch set (#11).

Change subject: KUDU-1988: add support for advertised host:port info.
..

KUDU-1988: add support for advertised host:port info.

Prior to this change, it was difficult to deploy a Kudu cluster in
containerized-environments where each container has a unique, local
subnetwork with an externally facing translation layer. In these
scenarios the network address that the Kudu server binds to is not the
externally-routable IP address. To overcome this, a set of new flags are
introduced, '--rpc-advertised-addresses' and
'--webserver-advertised-addresses' which allow administrators to
override the set of addresses which the Kudu server gives to external
servers and clients. Similar configurations are present in other
distributed systems, for example, Kafka's advertised.host.name property.

Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
---
M src/kudu/master/master.cc
M src/kudu/server/CMakeLists.txt
A src/kudu/server/rpc_server-test.cc
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/webserver-test.cc
M src/kudu/server/webserver.cc
M src/kudu/server/webserver.h
M src/kudu/server/webserver_options.cc
M src/kudu/server/webserver_options.h
M src/kudu/tserver/heartbeater.cc
11 files changed, 333 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6827/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] docs: clarify that max cell size applies to uncompressed size

2017-05-24 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: docs: clarify that max cell size applies to uncompressed size
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6977
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1755 Part 2: Improve table on-disk size metric

2017-05-24 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6968

to look at the new patch set (#3).

Change subject: KUDU-1755 Part 2: Improve table on-disk size metric
..

KUDU-1755 Part 2: Improve table on-disk size metric

This adds WAL segment sizes to the tablet on-disk metric. This is a
little bit trickier than previous improvements because this info is
held by the TabletReplica, not the Tablet, so the on-disk metric
itself had to be relocated and plugged in to TabletReplica::EstimateOnDiskSize.

Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315
---
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
3 files changed, 42 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1034 client does not failover due to timeout
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

PS4, Line 252: tserver::TabletServerIntegrationTestBase
> does FindTabletLeader() do what you need?
Excellent -- it looks like exactly what I need.  Thanks!


http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h
File src/kudu/rpc/retriable_rpc.h:

Line 167:   // A note regarding ServerPicker::MarkServerFailed() calls in this 
method:
> Isn't that just an abstract base class with no other implementations? Can w
Probably, yes.

However, I would prefer to do that in a separate changelist and involve more 
people in discussing that.  My concern is that we might be missing something 
looking only at this context.


PS4, Line 195: MarkServerFailed
> Maybe the confusion we've had about this is that the abstraction here is to
I think we can address that separately -- my current priority to have the bug 
fixed.  By my experience, discussing such things usually leads to many 
iterations with lot of bike-shedding, unfortunately.

What do you think if I file a separate JIRA item to track this and we leave 
this as-is in the current patchset?


-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config

2017-05-24 Thread Will Berkeley (Code Review)
Hello Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6772

to look at the new patch set (#11).

Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but 
still in config
..

KUDU-1860: ksck doesn't identify tablets that are evicted but still in config

This patch enhances ksck to gather consensus info from every
tablet. It compares this info with master and outputs the
master's config and every conflicting config, if there are any
conflicts. To do this efficiently it reimplements the
GetAllConsensusState RPC so that it gathers info about every
replica's consensus state.

This will catch at least the two problems identified in
KUDU-1860: 1. The leader has a pending config to remove a
tablet, but it is not committed so the master does not see this
config. This can hide an unhealthy tablet if, e.g., one pending
config member is down and the pending-to-be-kicked-out member is
up, so 1/2 replicas are alive in the leader's active config but
the master thinks 2/3 are alive. 2. No replica is leader but the
master believes there is a leader because its cache is old and
hasn't been updated.

Sample output showing #1:
https://gist.github.com/wdberkeley/d2606698e4f2e8ca3ef70d4dcef7ba9a

Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808
---
M src/kudu/consensus/consensus.proto
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tserver/tablet_replica_lookup.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
M src/kudu/tserver/ts_tablet_manager.h
14 files changed, 521 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/6772/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config

2017-05-24 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but 
still in config
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6772/10/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

PS10, Line 119:  are match
> extra word: are
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1034 client does not failover due to timeout
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h
File src/kudu/rpc/retriable_rpc.h:

Line 167:   // A note regarding ServerPicker::MarkServerFailed() calls in this 
method:
> The MarkServerFailed signature in ServerPicker assumes it can mark the whol
Isn't that just an abstract base class with no other implementations? Can we 
just redefine the semantics?


PS4, Line 195: MarkServerFailed
> I'm not sure about this: from the semantics of the error code SERVER_NOT_AC
Maybe the confusion we've had about this is that the abstraction here is too 
general for the problem domain. Maybe it's fine for the server picker to 
conceptually know about replicas.

If you disagree then we can leave it as-is, just trying to think of ways to 
make it more understandable.


-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1034 client does not failover due to timeout
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

PS4, Line 252: tserver::TabletServerIntegrationTestBase
> Because it's using GetLeaderReplicaWithRetries() which currently exists onl
does FindTabletLeader() do what you need?


-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6924

to look at the new patch set (#5).

Change subject: KUDU-1034 client does not failover due to timeout
..

KUDU-1034 client does not failover due to timeout

This patch fixes the issue described by KUDU-1034: the client does not
mark the failed tablet server as 'failed' in case of timeout and
continues to use it over and over again to send further requests,
even if other tablet replicas might be available.

Besides the actual fix, this patch incorporates an integration test
(ClientFailoverTServerTimeoutITest.FailoverOnLeaderTimeout) written by Mike.

Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
---
M src/kudu/client/meta_cache.h
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/rpc/retriable_rpc.h
4 files changed, 77 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6924/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6926

to look at the new patch set (#7).

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..

KUDU-1580 retry tserver RPC if negotiation times out

This patch addresses KUDU-1580, i.e. with this patch the Kudu C++ client
retries an RPC with other tablet replica if the connection negotiation
with current replica timed out.

Added new integration test to cover the updated client's behavior.

Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/rpc/connection.cc
M src/kudu/rpc/outbound_call.cc
M src/kudu/rpc/outbound_call.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/rpc-test-base.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/rpc_controller.cc
M src/kudu/rpc/rpc_controller.h
M src/kudu/rpc/rpc_introspection.proto
12 files changed, 264 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/6926/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1034 client does not failover due to timeout
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

PS4, Line 252: tserver::TabletServerIntegrationTestBase
> why not ExternalMiniClusterITestBase?
Because it's using GetLeaderReplicaWithRetries() which currently exists only in 
TabletServerIntegrationTestBase

Do you think it would be a good idea to move that GetLeaderReplicaWithRetries 
and all of its dependencies into cluster_itest_util.h or around?


http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h
File src/kudu/rpc/retriable_rpc.h:

Line 167:   // A note regarding ServerPicker::MarkServerFailed() calls in this 
method:
> I know I mentioned adding a comment but I was thinking just a 2- or 3-line 
The MarkServerFailed signature in ServerPicker assumes it can mark the whole 
server as failed.  It's just a detail of current implementation of that method 
in MetaCacheServerPicker.

Do you mean we want to have this comment not here but in MetaCacheServerPicker?


PS4, Line 195: MarkServerFailed
> Should we rename this method to MarkReplicaFailed for clarity?
I'm not sure about this: from the semantics of the error code 
SERVER_NOT_ACCESSIBLE you would assume it makes sense to mark the whole server 
failed.  As I understand it, teplica is not relevant in this context.

The same for the call of the same method in case of TimedOut status returned 
along with NON_RERTRIABLE_ERROR


-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..


Patch Set 6:

(4 comments)

> (4 comments)
 > 
 > I'm not sure what the purpose is of NEGOTIATION_TIMED_OUT in the
 > current patch.

The new NEGOTIATION_TIMED_OUT state is to mark an outbound call which timed out 
on connection negotiation.  It's different from timing out on sending data for 
RPC and awaiting for response.

Basically, the idea is to distinguish between errors which happen during 
connection negotiation and errors that happen while performing a RPC over 
already established connection.

http://gerrit.cloudera.org:8080/#/c/6926/4/src/kudu/rpc/outbound_call.cc
File src/kudu/rpc/outbound_call.cc:

Line 314:   const MonoDelta timeout = controller_->timeout();
> did you mean const& ?
not quite so: RpcController::timeout() returns by value, and MonoDelta 
reference would be the same size as the object itself, so no point to have a 
reference here.  I added 'const' just for readability, to allow the reader to 
understand this is not going to change in the scope.


http://gerrit.cloudera.org:8080/#/c/6926/6/src/kudu/rpc/outbound_call.cc
File src/kudu/rpc/outbound_call.cc:

Line 330:   return state_ == TIMED_OUT;
> return state_ == TIMED_OUT || state_ == NEGOTIATION_TIMED_OUT ?
I thought about that but was not sure it would work due to the  way it's used 
in CallTransferCallbacks::NotifyTransferFinished

I'll give it a second look.


http://gerrit.cloudera.org:8080/#/c/6926/4/src/kudu/rpc/outbound_call.h
File src/kudu/rpc/outbound_call.h:

PS4, Line 124: remove
> remote
Done


http://gerrit.cloudera.org:8080/#/c/6926/6/src/kudu/rpc/rpc_introspection.proto
File src/kudu/rpc/rpc_introspection.proto:

Line 43: NEGOTIATION_TIMED_OUT = 7;
> I don't really understand why these are defined both here and in outbound_c
I suspect the idea was to limit dependencies between components and have more 
flexibility for states of OutboundCall.  I just followed the existing practice 
here.


-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but 
still in config
..


Patch Set 10: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tools/ksck-test.cc
File src/kudu/tools/ksck-test.cc:

Line 453:   "  config from master:  A*  B   C   
  Yes   \n"
> Hm you'd have to actually get the RPC to return duplicates from 1 or more t
One way to test this is to put replicas on multiple TSes, shut the TSes down, 
replace one of the TS UUIDs, and restart the cluster. I see your point about 
the state of the test framework for KsckTest


http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tools/ksck-test.cc
File src/kudu/tools/ksck-test.cc:

Line 396: TEST_F(KsckTest, TestConsensusConflictExtraPeer) {
> This isn't an integration test. I went to go find the tool-itest I thought 
Oh, yeah. This test is a bit yuck.


http://gerrit.cloudera.org:8080/#/c/6772/10/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

PS10, Line 119:  are match
extra word: are


-- 
To view, visit http://gerrit.cloudera.org:8080/6772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1580 retry tserver RPC if negotiation times out
..


Patch Set 6:

(4 comments)

I'm not sure what the purpose is of NEGOTIATION_TIMED_OUT in the current patch.

http://gerrit.cloudera.org:8080/#/c/6926/4/src/kudu/rpc/outbound_call.cc
File src/kudu/rpc/outbound_call.cc:

Line 314:   const MonoDelta timeout = controller_->timeout();
did you mean const& ?


http://gerrit.cloudera.org:8080/#/c/6926/6/src/kudu/rpc/outbound_call.cc
File src/kudu/rpc/outbound_call.cc:

Line 330:   return state_ == TIMED_OUT;
return state_ == TIMED_OUT || state_ == NEGOTIATION_TIMED_OUT ?


http://gerrit.cloudera.org:8080/#/c/6926/4/src/kudu/rpc/outbound_call.h
File src/kudu/rpc/outbound_call.h:

PS4, Line 124: remove
remote


http://gerrit.cloudera.org:8080/#/c/6926/6/src/kudu/rpc/rpc_introspection.proto
File src/kudu/rpc/rpc_introspection.proto:

Line 43: NEGOTIATION_TIMED_OUT = 7;
I don't really understand why these are defined both here and in 
outbound_call.h, can't we just use these ones everywhere?


-- 
To view, visit http://gerrit.cloudera.org:8080/6926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1988: add support for advertised host:port info.
..


Patch Set 10:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/master/master.cc
File src/kudu/master/master.cc:

PS10, Line 241: web_server()->GetAdvertisedAddresses
Does it make sense to check for return code in this  case?


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server-test.cc
File src/kudu/server/rpc_server-test.cc:

Line 19: 
nit: consider adding at least

#include 
#include 

#include 

#include "kudu/util/net/sockaddr.h"


PS10, Line 38: string advertised = use_advertised_addresses();
nit: consider moving this after that if (!bind.empty()) { ... } closure


PS10, Line 50:   
nit: strange shift (supposed to be 4 spaces)


PS10, Line 55:   
nit: strange shift (supposed to be 4 spaces)


PS10, Line 66: const string
nit: what's use to return const by value?  Maybe, turn this into 'const 
string&' ?  or 'const char*' ?


PS10, Line 91: advertised_addrs.size(), 1
here is elsewhere in this file: the expected value should come first, that 
really helps when reading output when assertion fires.


PS10, Line 94:   ASSERT_EQ(advertised_addrs[0].host(), "127.0.0.1");
 :   ASSERT_EQ(advertised_addrs[0].port(), );
nit for here and elsewhere in the test: I think for readability it would be 
better to place those asserts close to the place where advertised_addrs are 
set, i.e. just after ASSERT_EQ(1, advertised_addrs.size());


PS10, Line 96: bound_addrs[0]
does it make sense to add an assert on the size of bound_addrs prior to 
accessing its first element?


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server.cc
File src/kudu/server/rpc_server.cc:

PS10, Line 45: DEFINE_string(rpc_bind_addresses, "0.0.0.0",
 :   "Comma-separated list of addresses to bind to for 
RPC connections. "
 :   "Currently, ephemeral ports (i.e. port 0) are not 
allowed.");
 : TAG_FLAG(rpc_bind_addresses, stable);
 : 
 : DEFINE_string(rpc_advertised_addresses, "",
 :   "Comma-separated list of addresses to advertise 
externally for RPC connections. "
 :   "Currently, ephemeral ports (i.e. port 0) are not 
allowed.");
 : TAG_FLAG(rpc_advertised_addresses, advanced);
Should advertised addresses be a subset of bind addresses?  If yes, consider 
adding a group validator on that.


-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config

2017-05-24 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but 
still in config
..


Patch Set 7:

(8 comments)

Both of your comments about more testing are things that aren't done well by 
the current ksck tests. I think the takeaway is there should by a ksck- or 
tool-itest to cover these situations. It would also be useful as a place to see 
what situations ksck should detect.

http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tools/ksck-test.cc
File src/kudu/tools/ksck-test.cc:

Line 453: 
> Sounds good. I'm just wondering if we can make a test case for that scenari
Hm you'd have to actually get the RPC to return duplicates from 1 or more ts. 
Another reason to add a tool itest like I mentioned in another comment.


http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tools/ksck-test.cc
File src/kudu/tools/ksck-test.cc:

Line 396: 
> Can we also add a test case with Committed == no?
This isn't an integration test. I went to go find the tool-itest I thought 
would exist and there isn't one. Looks like a todo: add a tool-itest or a 
ksck-itest.


http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

PS9, Line 604:  string tabl
> nit: how about ReplicaInfo since we have an important consensus class calle
Done


Line 612: boost::optional status_pb;
> nit: Your code comments should all end with punctuation per https://google.
Done


Line 672:  
repl_leader_uuid,
> not used? also missing ending ]
Done


http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

Line 123:   }
> When I see operator== I think that it's only going to return true for struc
Done


http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tserver/tablet_replica_lookup.h
File src/kudu/tserver/tablet_replica_lookup.h:

Line 54: 
> Add comment: Appends all non-tombstoned tablet replicas to the 'replicas' v
Done


http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tserver/tablet_service.h
File src/kudu/tserver/tablet_service.h:

Line 183:  
consensus::GetConsensusStateResponsePB* resp,
> nit: Indentation should look like GetLastOpId()
Fixed all the off indents in this section.


-- 
To view, visit http://gerrit.cloudera.org:8080/6772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1034 client does not failover due to timeout
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

PS4, Line 252: tserver::TabletServerIntegrationTestBase
why not ExternalMiniClusterITestBase?

I'm trying to standardize on that one for ExternalMiniCluster tests


http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h
File src/kudu/rpc/retriable_rpc.h:

Line 167:   // A note regarding ServerPicker::MarkServerFailed() calls in this 
method:
I know I mentioned adding a comment but I was thinking just a 2- or 3-line 
comment in the header file for the declaration of MarkServerFailed.


PS4, Line 195: MarkServerFailed
Should we rename this method to MarkReplicaFailed for clarity?


-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Fix flaky test TestRestartWithOrphanedReplicates

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Fix flaky test TestRestartWithOrphanedReplicates
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6976/1//COMMIT_MSG
Commit Message:

PS1, Line 12: 1 failure per 1000
> Yea, let me stash it on ve0518 and I can do a rerun.
You shouldn't need to do that, you can see your historical runs at 
http://dist-test.cloudera.org


-- 
To view, visit http://gerrit.cloudera.org:8080/6976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] Fix flaky test TestRestartWithOrphanedReplicates

2017-05-24 Thread Edward Fancher (Code Review)
Edward Fancher has posted comments on this change.

Change subject: Fix flaky test TestRestartWithOrphanedReplicates
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6976/2/src/kudu/integration-tests/ts_recovery-itest.cc
File src/kudu/integration-tests/ts_recovery-itest.cc:

Line 97:   // 1. Setup
> I don't think this comment really adds any value to this test.
I'll remove it.


Line 108:   cluster_->SetFlag(cluster_->tablet_server(0),
> I think a better comment here would be:
Ok. I'll replace it.


Line 121:   // 3. Validate
> I think a better comment here would be:
Ok, I'll replace it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] Fix flaky test TestRestartWithOrphanedReplicates

2017-05-24 Thread Edward Fancher (Code Review)
Edward Fancher has posted comments on this change.

Change subject: Fix flaky test TestRestartWithOrphanedReplicates
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6976/1//COMMIT_MSG
Commit Message:

PS1, Line 7: Fix flaky test
> it would be nice to add name of the test here
Ok, will do.


PS1, Line 9: Looks like the test was setting a fault injection flag before the 
setup
   : which was causing the setup to fail
> It would be nice to add the error message that the failed setup output into
Ok, will do.


PS1, Line 12: 1 failure per 1000
> Is it possible to add a link to one of those 1-in-1K dist-test  failed runs
Yea, let me stash it on ve0518 and I can do a rerun.


-- 
To view, visit http://gerrit.cloudera.org:8080/6976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1988: add support for advertised host:port info.
..


Patch Set 10:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/CMakeLists.txt
File src/kudu/server/CMakeLists.txt:

Line 110: ADD_KUDU_TEST(rpc_server-test)
Nit: resort


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server-test.cc
File src/kudu/server/rpc_server-test.cc:

PS10, Line 30:   RpcServerAdvertisedAddressesTest() {
 :   }
Nit: can omit


Line 42:   opts.rpc_bind_addresses = "127.0.0.1:9898";
Can we avoid hardcoding ports in the bound addresses? It's going to make the 
test flaky. Can we test ephemeral ports only?


Line 48: gscoped_ptr metric_registry(new MetricRegistry());
Nit: should use unique_ptr.


Line 53: std::shared_ptr messenger;
Add "using std::shared_ptr" and omit the prefix.


PS10, Line 54: builder.set_num_reactors(1)
 :   .set_min_negotiation_threads(1)
 :   .set_max_negotiation_threads(1)
 :   .set_metric_entity(metric_entity)
 :   .enable_inbound_tls();
Are all of these non-default settings needed for the test? If not, can you 
remove them? If so, can you add a little comment for each explaining why?


Line 59: builder.Build();
ASSERT_OK on this.


PS10, Line 60: CHECK_OK(server_->Init(messenger));
 : CHECK_OK(server_->Bind());
It's OK to use ASSERT_OK() in SetUp, I believe.


Line 69:   gscoped_ptr server_;
unique_ptr here too.


PS10, Line 89:   vector advertised_addrs;
 :   ASSERT_OK(server_->GetAdvertisedAddresses(_addrs));
 :   ASSERT_EQ(advertised_addrs.size(), 1);
 :   vector bound_addrs;
 :   ASSERT_OK(server_->GetBoundAddresses(_addrs));
Maybe add a test method that fetches the server's advertised and bound 
addresses in one go?


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server.cc
File src/kudu/server/rpc_server.cc:

Line 51:   "Comma-separated list of addresses to advertise 
externally for RPC connections. "
Would be nice to include a sentence or two about how when to use this and not 
just rely on rpc_bind_addresses.


PS10, Line 124: LOG(FATAL) << "Advertising an ephemeral port is not 
supported (RPC advertised address "
  :<< "configured to " << addr.ToString() << 
")";
We could do this via gflag validation, no? Just look for a :0 at the end of an 
address?


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/webserver-test.cc
File src/kudu/server/webserver-test.cc:

Line 241: static_dir_ = GetTestPath("webserver-docroot");
This isn't used by any tests; can we just create it inline with opts.doc_root = 
... ?


Line 265: AddDefaultPathHandlers(server_.get());
Is this necessary? We're not actually accessing the webserver.


Line 287:   int32 use_webserver_port() const override { return ; }
Again, concerned about the flakiness of hard-coding a port here.


PS10, Line 298:   vector advertised_addrs;
  :   ASSERT_OK(server_->GetAdvertisedAddresses(_addrs));
  :   ASSERT_EQ(advertised_addrs.size(), 1);
  :   vector bound_addrs;
  :   ASSERT_OK(server_->GetBoundAddresses(_addrs));
Add a test method to fetch both sets of addresses?


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/webserver.cc
File src/kudu/server/webserver.cc:

PS10, Line 220: return Status::InvalidArgument("advertising an 
ephemeral webserver port is not supported",
  :addr.ToString());
Again, let's validate this with a gflag validator.


http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/webserver_options.cc
File src/kudu/server/webserver_options.cc:

Line 46: DEFINE_string(webserver_advertised_addresses, "",
For consistency, shouldn't this be named webserver_advertised_interface?


Line 47:   "Addresses to advertise externally for web connections."
Would be nice to add a sentence or two explaining when to use this in lieu of 
webserver_interface.


-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] Fix flaky test TestRestartWithOrphanedReplicates

2017-05-24 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Fix flaky test TestRestartWithOrphanedReplicates
..


Patch Set 2: Code-Review+1

(3 comments)

Change looks good, I have a few nitpicky comments about, well, comments.

http://gerrit.cloudera.org:8080/#/c/6976/2/src/kudu/integration-tests/ts_recovery-itest.cc
File src/kudu/integration-tests/ts_recovery-itest.cc:

Line 97:   // 1. Setup
I don't think this comment really adds any value to this test.


Line 108:   cluster_->SetFlag(cluster_->tablet_server(0),
I think a better comment here would be:

  // Crash when the WAL contains a replicate message but no corresponding 
commit.


Line 121:   // 3. Validate
I think a better comment here would be:

  // Restart the server and check to make sure that the change is eventually 
applied.


-- 
To view, visit http://gerrit.cloudera.org:8080/6976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change.

Change subject: KUDU-1988: add support for advertised host:port info.
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6827/10//COMMIT_MSG
Commit Message:

Line 7: KUDU-1988: add support for advertised host:port info.
Can you add more detail info here?


-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] Fix flaky test TestRestartWithOrphanedReplicates

2017-05-24 Thread Edward Fancher (Code Review)
Edward Fancher has uploaded a new patch set (#2).

Change subject: Fix flaky test TestRestartWithOrphanedReplicates
..

Fix flaky test TestRestartWithOrphanedReplicates

Looks like the test was setting a fault injection flag before the setup
which was causing the setup to fail. Moving the flag setting to after
the setup, but before Start seems to have done the trick.
We were getting about 1 failure per 1000 runs before this change,
0 per 1000 after.
Dist test job:

http://dist-test.cloudera.org/job?job_id=efan.1495643022.19373

Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
---
M src/kudu/integration-tests/ts_recovery-itest.cc
1 file changed, 7 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/6976/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-1755 Part 2: Improve table on-disk size metric

2017-05-24 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: KUDU-1755 Part 2: Improve table on-disk size metric
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6968/1/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

Line 91: using consensus::Consensus;
> warning: using decl 'Consensus' is unused [misc-unused-using-decls]
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1755 Part 2: Improve table on-disk size metric

2017-05-24 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6968

to look at the new patch set (#2).

Change subject: KUDU-1755 Part 2: Improve table on-disk size metric
..

KUDU-1755 Part 2: Improve table on-disk size metric

This adds WAL segment sizes to the tablet on-disk metric. This is a
little bit trickier than previous improvements because this info is
held by the TabletReplica, not the Tablet, so the on-disk metric
itself had to be relocated and plugged in to TabletReplica::EstimateOnDiskSize.

Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315
---
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
3 files changed, 39 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Fix flaky test

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: Fix flaky test
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6976/1//COMMIT_MSG
Commit Message:

PS1, Line 7: Fix flaky test
it would be nice to add name of the test here


PS1, Line 9: Looks like the test was setting a fault injection flag before the 
setup
   : which was causing the setup to fail
It would be nice to add the error message that the failed setup output into the 
log.


PS1, Line 12: 1 failure per 1000
Is it possible to add a link to one of those 1-in-1K dist-test  failed runs as 
well?


-- 
To view, visit http://gerrit.cloudera.org:8080/6976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify that max cell size applies to uncompressed size

2017-05-24 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: docs: clarify that max cell size applies to uncompressed size
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6977/2/docs/known_issues.adoc
File docs/known_issues.adoc:

PS2, Line 68: ncodingn
> encoding
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6977
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify that max cell size applies to uncompressed size

2017-05-24 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6977

to look at the new patch set (#3).

Change subject: docs: clarify that max cell size applies to uncompressed size
..

docs: clarify that max cell size applies to uncompressed size

Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532
---
M docs/known_issues.adoc
M docs/schema_design.adoc
2 files changed, 8 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/6977/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6977
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] docs: clarify that max cell size applies to uncompressed size

2017-05-24 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: docs: clarify that max cell size applies to uncompressed size
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6977/2/docs/known_issues.adoc
File docs/known_issues.adoc:

PS2, Line 68: ncodingn
encoding


-- 
To view, visit http://gerrit.cloudera.org:8080/6977
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] docs: clarify that max cell size applies to uncompressed size

2017-05-24 Thread Dan Burkert (Code Review)
Hello Ambreen Kazi, Todd Lipcon,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/6977

to review the following change.

Change subject: docs: clarify that max cell size applies to uncompressed size
..

docs: clarify that max cell size applies to uncompressed size

Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532
---
M docs/schema_design.adoc
1 file changed, 7 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/6977/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6977
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6827

to look at the new patch set (#10).

Change subject: KUDU-1988: add support for advertised host:port info.
..

KUDU-1988: add support for advertised host:port info.

Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
---
M src/kudu/master/master.cc
M src/kudu/server/CMakeLists.txt
A src/kudu/server/rpc_server-test.cc
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/webserver-test.cc
M src/kudu/server/webserver.cc
M src/kudu/server/webserver.h
M src/kudu/server/webserver_options.cc
M src/kudu/server/webserver_options.h
M src/kudu/tserver/heartbeater.cc
11 files changed, 323 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6827/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Fix flaky test

2017-05-24 Thread Edward Fancher (Code Review)
Edward Fancher has restored this change.

Change subject: Fix flaky test
..


Restored

-- 
To view, visit http://gerrit.cloudera.org:8080/6976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: restore
Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-871 (part 1). Refactor to make cmeta a first class object

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-871 (part 1). Refactor to make cmeta a first class object
..


Patch Set 1:

(6 comments)

some nits while skimming through

http://gerrit.cloudera.org:8080/#/c/6957/1//COMMIT_MSG
Commit Message:

PS1, Line 35: Remove unused Consensus::State enum.
nit: maybe, it makes sense to publish this change as separate tiny clean-up 
changelist?


PS1, Line 37: * bug fix: RaftConsensus must hold lock before snoozing FD
nit: if it's a bug fix, would it make sense to separate it and make sure all 
current tests pass before introducing such a major refactoring?  Or it becomes 
a bug only when combined with the new refactored code?


http://gerrit.cloudera.org:8080/#/c/6957/1/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

PS1, Line 77: TABLET_DATA_FRESH
Given the description I would expect this is a broader counterpart of 
TABLET_DATA_COPYING.  Is that true or there are some additional nits regarding 
voting, etc.?


http://gerrit.cloudera.org:8080/#/c/6957/1/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

PS1, Line 244: $0Unable
typo?  it seems space is missing


PS1, Line 593: *tablet_replica = tmp_replica;
Would it make sense to allow the 'tablet_replica' parameter be nullptr?  It 
seems in a couple of places in this file the result tablet replica output by 
this method is not used at this method's call sites.


PS1, Line 1064: $0Unable
typo?


-- 
To view, visit http://gerrit.cloudera.org:8080/6957
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia27a091d27b3996d37009d5ec866e744f9608388
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6827

to look at the new patch set (#9).

Change subject: KUDU-1988: add support for advertised host:port info.
..

KUDU-1988: add support for advertised host:port info.

Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
---
M src/kudu/master/master.cc
M src/kudu/server/CMakeLists.txt
A src/kudu/server/rpc_server-test.cc
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/webserver-test.cc
M src/kudu/server/webserver.cc
M src/kudu/server/webserver.h
M src/kudu/server/webserver_options.cc
M src/kudu/server/webserver_options.h
M src/kudu/tserver/heartbeater.cc
11 files changed, 317 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6827/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] Fix flaky test

2017-05-24 Thread Edward Fancher (Code Review)
Edward Fancher has abandoned this change.

Change subject: Fix flaky test
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/6976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] Fix flaky test

2017-05-24 Thread Edward Fancher (Code Review)
Edward Fancher has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6976

Change subject: Fix flaky test
..

Fix flaky test

Looks like the test was setting a fault injection flag before the setup
which was causing the setup to fail. Moving the flag setting to after
the setup, but before Start seems to have done the trick.
We were getting about 1 failure per 1000 runs before this change,
0 per 1000 after.
Dist test job:

http://dist-test.cloudera.org/job?job_id=efan.1495643022.19373

Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
---
M src/kudu/integration-tests/ts_recovery-itest.cc
1 file changed, 7 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/6976/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 


[kudu-CR] KUDU-1988: add support for advertised host:port info.

2017-05-24 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1988: add support for advertised host:port info.
..


Patch Set 8:

Patrik indicated he's tied up today, and I want to make sure this lands in time 
for 1.4, so I'm going to push the fixups today.

-- 
To view, visit http://gerrit.cloudera.org:8080/6827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Patrik Sundberg 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Patrik Sundberg 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No


[kudu-CR] KUDU-1755 Part 1: Improve tablet on disk size metric

2017-05-24 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6967

to look at the new patch set (#2).

Change subject: KUDU-1755 Part 1: Improve tablet on disk size metric
..

KUDU-1755 Part 1: Improve tablet on disk size metric

This adds bloomfile, ad hoc index, and superblock sizes to the on-disk
size metric for tablets.

A follow up will address log segments and cmeta.

Change-Id: I32dce598bbb8e18325210a49fc436fd0f7ac68fd
---
M src/kudu/cfile/bloomfile-test.cc
M src/kudu/cfile/bloomfile.h
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/rowset_info.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tablet/tablet_replica.h
M src/kudu/util/metrics.h
14 files changed, 87 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/6967/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6967
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32dce598bbb8e18325210a49fc436fd0f7ac68fd
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-1952 Remove round-robin for block placement

2017-05-24 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1952 Remove round-robin for block placement
..


Patch Set 35:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/cfile/bloomfile-test-base.h
File src/kudu/cfile/bloomfile-test-base.h:

PS35, Line 74: fs::CreateBlockOptions()
nit: would just '{}' be enough?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

PS35, Line 218: CreateBlockOptions()
nit here and below: would '{}' fit in here?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-stress-test.cc
File src/kudu/fs/block_manager-stress-test.cc:

PS35, Line 260: this->
is 'this' necessary here?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS35, Line 137: Status
nit: consider adding 'static' specifier -- as I see this method does not depend 
on the state of the object.


PS35, Line 155: _count
nit: why not to pass 'num_files' itself as an argument?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs-test.cc
File src/kudu/fs/data_dirs-test.cc:

Line 17: 
nit: please include at least  and  since they are used in the 
code.


Line 18: #include "kudu/fs/fs.pb.h"
nit: please include  and 


PS35, Line 55: CHECK_OK
nit: why not just ASSERT_OK() ?


PS35, Line 56: CHECK_OK
ditto


PS35, Line 69: string test_tablet_name_;
nit: I don't see it's changing during the tests, so consider adding 'const'.


PS35, Line 70: CreateBlockOptions test_block_opts_
const?


PS35, Line 72: DataDirGroupPB pb_;
nit: is it really needed as a member?  Would local variable be enough where 
needed?


PS35, Line 80: ASSERT_TRUE(s.IsNotFound());
Here and everywhere in the tests consider adding s.ToString() in case if it 
fails, that helps a lot other people during troubleshooting if this ever fails:

ASSERT_TRUE(s.IsNotFound()) << s.ToString();


PS35, Line 96: for (int i = 0; i < pb.uuids().size(); i++) {
Would it make sense to compare pb.uuids.size() and pb_.uuids.size() prior to 
comparing elements in this for() loop?


PS35, Line 102: dd
Does it make sense to check for invariants on 'dd' after this call?  What are 
the expected side-effects after successful call of GetNextDataDir() for 'dd'?


PS35, Line 120:   DataDir* dd;
  :   ASSERT_OK(dd_manager_->GetNextDataDir(test_block_opts_, ));
  :   dd_manager_->DeleteDataDirGroup(test_tablet_name_);
  :   Status s = dd_manager_->GetNextDataDir(test_block_opts_, );
Is there anything specific for 'dd' in the course of there calls?  If yes, 
consider adding corresponding assertions.


PS35, Line 129: FLAGS_fs_data_dirs_full_disk_cache_seconds = 0;
If changing this flag on-the-fly, does it make sense to add the 'runtime' tag 
for the flag?


PS35, Line 235: FindOrDie
nit: since this is a test, it's possible to use something like

... x =  FindOrNull(...);
ASSERT_NE(nullptr, x);


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

PS35, Line 449: FLAGS_fs_target_data_dirs_per_tablet
A paranoid nit: what if FLAGS_fs_target_data_dirs_per_tablet is set to 2^31 ?

Consider either adding a validator for the flag or using unsigned int for 
group_target_size or enforcing consistency by some other means.


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS35, Line 87: uuid_indices
nit: consider adding 'const' specified for this method.


PS35, Line 225: void
nit: would the caller ever be interested to know if any group were effectively 
deleted or not?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

PS35, Line 93: string tablet_id
const string& tablet_id ?


Line 112:   string tablet_id = rowset_metadata_->tablet_metadata()->tablet_id();
ditto


PS35, Line 127: string tablet_id
ditto


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/multi_column_writer.cc
File src/kudu/tablet/multi_column_writer.cc:

Line 52:   CreateBlockOptions block_opts({ tablet_id_ });
nit: const?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/multi_column_writer.h
File src/kudu/tablet/multi_column_writer.h:

PS35, Line 20: #include 
nit: move this after the std headers.


PS35, Line 90: std::string tablet_id_;
nit: const?


http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS35, Line 629: GetDataDirGroupPBForTablet
Is it worth to check for the return value of GetDataDirGroupPBForTablet()?


-- 
To view, visit http://gerrit.cloudera.org:8080/6636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master

[kudu-CR] KUDU-1125 (part 1) catalog manager: try to avoid unnecessarily rewriting tablet info

2017-05-24 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1125 (part 1) catalog_manager: try to avoid unnecessarily 
rewriting tablet info
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6916/3/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

Line 114:   return md.Compare(prev_pb, new_pb);
I presume this is equivalent to prev_pb == new_pb? But you're going through the 
MessageDifferencer for the diff_str!= null case?

Bearing in mind that the diff_str!=null case is very rare (VLOG only), would it 
be faster to use equality? Or are the two approaches equivalent?


Line 692: VLOG(2) << "Adding tablet " << tablet->tablet_id() << " in 
catalog: "
Any particular reason you want this VLOG but not one for ReqAddTable()?


http://gerrit.cloudera.org:8080/#/c/6916/3/src/kudu/master/sys_catalog.h
File src/kudu/master/sys_catalog.h:

Line 164:   const scoped_refptr& tablet_replica() const {
Since this is only being used in a test, an alternative would be to keep it 
private and add FRIEND_TEST() as needed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0de7189b8f1dbeea55172929396b73fd92fd62ba
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes