[kudu-CR] [authn token expire-itest] minor clean-up
Alexey Serbin has uploaded a new patch set (#2). Change subject: [authn_token_expire-itest] minor clean-up .. [authn_token_expire-itest] minor clean-up This patch contains a couple of updates on authn_token_expire-itest: * Increase authn token's TTL from 2 to 5 seconds in workload-related test scenario to make it more stable when running with other parallel activity. * Do not call ExternalMiniCluster::Start() for slow scenarios to avoid wasting time if KUDU_ALLOW_SLOW_TESTS is not enabled. * More relevant names for tests classes. Change-Id: I437f2f72762c3a6e037792a511259734c8b0d860 --- M src/kudu/integration-tests/authn_token_expire-itest.cc 1 file changed, 35 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/6987/2 -- To view, visit http://gerrit.cloudera.org:8080/6987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I437f2f72762c3a6e037792a511259734c8b0d860 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [authn token expire-itest] minor clean-up
Alexey Serbin has uploaded a new change for review. http://gerrit.cloudera.org:8080/6987 Change subject: [authn_token_expire-itest] minor clean-up .. [authn_token_expire-itest] minor clean-up This patch contains a couple of updates on authn_token_expire-itest: * Increase authn token's TTL from 2 to 5 seconds in workload-related test scenario to make it more stable when running with other parallel activity. * Do not call ExternalMiniCluster::Start() for slow scenarios to to avoid wasting time if KUDU_ALLOW_SLOW_TESTS is not enabled. * More relevant names for tests classes. Change-Id: I437f2f72762c3a6e037792a511259734c8b0d860 --- M src/kudu/integration-tests/authn_token_expire-itest.cc 1 file changed, 35 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/6987/1 -- To view, visit http://gerrit.cloudera.org:8080/6987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I437f2f72762c3a6e037792a511259734c8b0d860 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin
[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs
Alexey Serbin has posted comments on this change. Change subject: KUDU-2021 test for negotiation timeout on Master RPCs .. Patch Set 8: Verified+1 Flakes: 1. Port conflict while running ClientFailoverOnNegotiationTimeoutITest.Kudu2021NegotiateWithMaster 2. Absolutely unrelated python symbols error. -- To view, visit http://gerrit.cloudera.org:8080/6927 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1911 add more helpful error message when required tool arguments are missing
Sam Okrent has uploaded a new change for review. http://gerrit.cloudera.org:8080/6986 Change subject: KUDU-1911 add more helpful error message when required tool arguments are missing .. KUDU-1911 add more helpful error message when required tool arguments are missing When using a tool that has a required positional argument, excluding that argument or passing it as a flag caused a confusing error message to be printed out ("must provide missing_argument"). This commit changes the message to "must provide positional argument missing_argument". Change-Id: I2128a01da5c4929981b2ea46a32b0f7b9dd066e1 --- M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_main.cc 2 files changed, 18 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/86/6986/1 -- To view, visit http://gerrit.cloudera.org:8080/6986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I2128a01da5c4929981b2ea46a32b0f7b9dd066e1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sam Okrent
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Andrew Wong has posted comments on this change. Change subject: KUDU-1952 Remove round-robin for block placement .. Patch Set 35: (1 comment) http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS35, Line 137: Status > Templating and the closure made this a bit tricky. For now I'm leaving it b Done -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 35 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs
Alexey Serbin has posted comments on this change. Change subject: KUDU-2021 test for negotiation timeout on Master RPCs .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/6927/6//COMMIT_MSG Commit Message: Line 7: KUDU-2021 retry master RPC if negotiation times out > How about: That looks better! I would shorted it a bit, though: KUDU-2021 test for negotiation timeout on Master RPCs http://gerrit.cloudera.org:8080/#/c/6927/6/src/kudu/integration-tests/client_failover-itest.cc File src/kudu/integration-tests/client_failover-itest.cc: Line 484: static const int kTimeoutMs = 1 * 60 * 1000; > nit: how about const MonoDelta kTimeout = MonoDelta::FromSeconds(60) yep, that's much better. Line 488: cluster_opts_.master_rpc_ports = { 37030, 37031, 37032 }; > huh. do we have to pick these manually? Yep, that's the ugly piece of all multi-master tests. The crux is that all masters should know other master's end-points when starting up. The fix would be: make probing on ports, somehow reserve them, and then start the masters with the reserved ports. Line 540: ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms", "500")); > It's not obvious to me why a 500ms negotiation delay is sufficient to cause It seems this went this state after some iterations. It supposed to be something like the following: ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms", Substitute("$0", FLAGS_rpc_negotiation_timeout_ms))); thread pause_thread([&]() { SleepFor(MonoDelta::FromMilliseconds(FLAGS_rpc_negotiation_timeout_ms / 2)); CHECK_OK(m->Pause()) }); Line 542: SleepFor(MonoDelta::FromSeconds(3)); > What's the purpose of this sleep? Is it so this executes after ListTables() Yep, it seem that was a typo. I added corresponding comment into the code regarding purpose of this sleep. Basically, we want the former leader master to lose its leadership role, but only after the client has connected and started connection negotiation. Line 548: ScopedResumeExternalDaemon resumer(m); > CHECK_OK(m->Resume()); ? Done -- To view, visit http://gerrit.cloudera.org:8080/6927 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6927 to look at the new patch set (#8). Change subject: KUDU-2021 test for negotiation timeout on Master RPCs .. KUDU-2021 test for negotiation timeout on Master RPCs This patch adds integration tests to verify that the Kudu C++ client behaves as described in KUDU-2021: in case of a multi-master Kudu cluster, the client retries an RPC with other master if the connection negotiation with leader master times out. Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c --- M src/kudu/integration-tests/client_failover-itest.cc 1 file changed, 96 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6927/8 -- To view, visit http://gerrit.cloudera.org:8080/6927 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6636 to look at the new patch set (#38). Change subject: KUDU-1952 Remove round-robin for block placement .. KUDU-1952 Remove round-robin for block placement This is the first of a multi-patch patchset to mitigate the effects of single-disk failure. Throughout the code, the term "DataDir" refers to a data directory, which is often mounted on a distinct disk. Thus, "disks" and "data directories" will be used interchangeably. This patch adds a mapping from tablet to a set of disks and uses it to replace the existing round-robin placement of blocks. Tablets are mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks are placed randomly in directories within each tablet's DataDirGroup. Tablet-to-group mappings are generated and stored as metadata upon tablet creation, or upon tablet replacement during a tablet copy. During group creation, disks are added to groups by randomly selecting two available directories and selecting the one with fewer tablets on it. This avoids pigeonholing new tablets to disks with relatively few tablets, while still trending towards filling underloaded disks. Groups are maintained when restarting the server, as they are flushed with metadata, and are deleted upon tablet deletion. When loading tablet data from a previous version of Kudu, the tablet's metadata will not have a DataDirGroup. One will be generated containing all data directories, as the tablet's data may already be spread across any number of disks. As this patch only addresses block placement, it does not itself mitigate the effects of single-disk failure. Given this, and given the tradeoff between I/O and disk-failure tolerance, the default behavior will be to spread tablet data across all disks. Testing is done at the block manager level in block_manager-test and log_block_manager-test, as well as in the new data_dirs-test. A design doc can be found here: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b --- M src/kudu/cfile/bloomfile-test-base.h M src/kudu/cfile/cfile-test-base.h M src/kudu/cfile/cfile-test.cc M src/kudu/fs/CMakeLists.txt M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h A src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/delta_compaction-test.cc M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/pb_util.proto 39 files changed, 1,034 insertions(+), 200 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/38 -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 38 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6636 to look at the new patch set (#37). Change subject: KUDU-1952 Remove round-robin for block placement .. KUDU-1952 Remove round-robin for block placement This is the first of a multi-patch patchset to mitigate the effects of single-disk failure. Throughout the code, the term "DataDir" refers to a data directory, which is often mounted on a distinct disk. Thus, "disks" and "data directories" will be used interchangeably. This patch adds a mapping from tablet to a set of disks and uses it to replace the existing round-robin placement of blocks. Tablets are mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks are placed randomly in directories within each tablet's DataDirGroup. Tablet-to-group mappings are generated and stored as metadata upon tablet creation, or upon tablet replacement during a tablet copy. During group creation, disks are added to groups by randomly selecting two available directories and selecting the one with fewer tablets on it. This avoids pigeonholing new tablets to disks with relatively few tablets, while still trending towards filling underloaded disks. Groups are maintained when restarting the server, as they are flushed with metadata, and are deleted upon tablet deletion. When loading tablet data from a previous version of Kudu, the tablet's metadata will not have a DataDirGroup. One will be generated containing all data directories, as the tablet's data may already be spread across any number of disks. As this patch only addresses block placement, it does not itself mitigate the effects of single-disk failure. Given this, and given the tradeoff between I/O and disk-failure tolerance, the default behavior will be to spread tablet data across all disks. Testing is done at the block manager level in block_manager-test and log_block_manager-test, as well as in the new data_dirs-test. A design doc can be found here: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b --- M src/kudu/cfile/bloomfile-test-base.h M src/kudu/cfile/cfile-test-base.h M src/kudu/cfile/cfile-test.cc M src/kudu/fs/CMakeLists.txt M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h A src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/delta_compaction-test.cc M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/pb_util.proto 39 files changed, 1,035 insertions(+), 200 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/37 -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 37 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Alexey Serbin has submitted this change and it was merged. Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. KUDU-1580 retry tserver RPC if negotiation times out This patch addresses KUDU-1580, i.e. with this patch the Kudu C++ client retries an RPC with other tablet replica if the connection negotiation with current replica timed out. Added new integration test to cover the updated client's behavior. Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Reviewed-on: http://gerrit.cloudera.org:8080/6926 Reviewed-by: Mike PercyTested-by: Kudu Jenkins --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/client_failover-itest.cc M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/rpc/connection.cc M src/kudu/rpc/outbound_call.cc M src/kudu/rpc/outbound_call.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/rpc-test-base.h M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/rpc_controller.cc M src/kudu/rpc/rpc_controller.h M src/kudu/rpc/rpc_introspection.proto 12 files changed, 265 insertions(+), 39 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: consolidate Raft thread pools
Adar Dembo has posted comments on this change. Change subject: consensus: consolidate Raft thread pools .. Patch Set 4: I ran raft_consensus-itest in slow mode 1000 times. All of them passed. -- To view, visit http://gerrit.cloudera.org:8080/6946 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6636 to look at the new patch set (#36). Change subject: KUDU-1952 Remove round-robin for block placement .. KUDU-1952 Remove round-robin for block placement This is the first of a multi-patch patchset to mitigate the effects of single-disk failure. Throughout the code, the term "DataDir" refers to a data directory, which is often mounted on a distinct disk. Thus, "disks" and "data directories" will be used interchangeably. This patch adds a mapping from tablet to a set of disks and uses it to replace the existing round-robin placement of blocks. Tablets are mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks are placed randomly in directories within each tablet's DataDirGroup. Tablet-to-group mappings are generated and stored as metadata upon tablet creation, or upon tablet replacement during a tablet copy. During group creation, disks are added to groups by randomly selecting two available directories and selecting the one with fewer tablets on it. This avoids pigeonholing new tablets to disks with relatively few tablets, while still trending towards filling underloaded disks. Groups are maintained when restarting the server, as they are flushed with metadata, and are deleted upon tablet deletion. When loading tablet data from a previous version of Kudu, the tablet's metadata will not have a DataDirGroup. One will be generated containing all data directories, as the tablet's data may already be spread across any number of disks. As this patch only addresses block placement, it does not itself mitigate the effects of single-disk failure. Given this, and given the tradeoff between I/O and disk-failure tolerance, the default behavior will be to spread tablet data across all disks. Testing is done at the block manager level in block_manager-test and log_block_manager-test, as well as in the new data_dirs-test. A design doc can be found here: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b --- M src/kudu/cfile/bloomfile-test-base.h M src/kudu/cfile/cfile-test-base.h M src/kudu/cfile/cfile-test.cc M src/kudu/fs/CMakeLists.txt M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h A src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/delta_compaction-test.cc M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/pb_util.proto 39 files changed, 1,040 insertions(+), 201 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/36 -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 36 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] server: move apply pool into ServerBase
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6984 to look at the new patch set (#2). Change subject: server: move apply_pool into ServerBase .. server: move apply_pool into ServerBase The server-wide apply_pool was instantiated in two places: SysCatalogTable (for the master) and TsTabletManager (for the tserver). This commit moves it into ServerBase and unifies the instantiation. Note: the apply pool semantics have not changed. Some interesting side effects: 1. The master will now generate apply pool metrics. 2. The apply pool is shut down a little bit later in server shutdown than it was before, though I don't see any issues with this. Change-Id: Ie7ffc886269aa6531517a52fef29c4408a925aed --- M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/sys_catalog.h M src/kudu/server/server_base.cc M src/kudu/server/server_base.h M src/kudu/tserver/tablet_server.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 8 files changed, 48 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/6984/2 -- To view, visit http://gerrit.cloudera.org:8080/6984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie7ffc886269aa6531517a52fef29c4408a925aed Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] threadpool: token-based task sequencing
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6874 to look at the new patch set (#4). Change subject: threadpool: token-based task sequencing .. threadpool: token-based task sequencing This patch adds task sequencing to util/threadpool. Task sequencing allows m contexts to share a single pool with n threads while also ensuring that the pool executes tasks belonging to each context in order. Previously this was only achievable via "one singleton pool per context", which grossly inflated the total number of threads and overall state. A group of tasks are sequenced by submission to a dedicated "slot". A client obtains exclusive access to a slot via AllocateTokenSlot(), which returns a "token" that can be used for task submission/waiting. When the token is destroyed (or Release() is called), the slot is returned to the pool. For implementation simplicity, clients must wait for a token's outstanding tasks to complete before destroying their token. Some notes: - Slots and tokens are mapped 1-1 so they could theoretically be combined, but I prefer this separation of concerns. - The current implementation requires tokens to have no outstanding tasks when they are released. This was a conscious choice made so that token destruction (especially when done as part of a larger object graph destruct) won't take an unusually long amount of time. - I evaluated two other implementations. In one, tokens had an implicit lifecycle that was automatically managed by the threadpool. While simpler for clients, the threadpool was more inefficient with more allocations and deallocations in each submission. In the other, token submission was built using regular task submission. This afforded a certain separation between the "core" of the threadpool and the token logic, but it complicated locking and tracking of queued tasks. - I tried to keep submission (whether token-based or tokenless) fast. Just the change from std::list to std::deque for the main queue ought to improve performance of tokenless submissions. The next bottleneck is likely to be lock contention on the global threadpool lock. Change-Id: If46dc34212027b6ea5dbc2ead7c7af8be57f2c70 --- M src/kudu/util/debug-util.h M src/kudu/util/threadpool-test.cc M src/kudu/util/threadpool.cc M src/kudu/util/threadpool.h 4 files changed, 943 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/6874/4 -- To view, visit http://gerrit.cloudera.org:8080/6874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If46dc34212027b6ea5dbc2ead7c7af8be57f2c70 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] threadpool: new test for pools with no max threads
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6945 to look at the new patch set (#5). Change subject: threadpool: new test for pools with no max_threads .. threadpool: new test for pools with no max_threads This test ensures that a pool created with effectively no max_threads works as expected. That is: 1. Tokenless tasks should trigger the creation of a new thread. 2. Token-based tasks can create new threads, but only up to the number of tokens submitted against. I intend to use this "feature" to consolidate some Raft thread pools where a num_cpus upper bound may be undesirable (i.e. where tasks submitted to the pools may result in blocking IO). Change-Id: I37d0bd0a05098bcc1a81ec9f1ac33e74e98781e4 --- M src/kudu/util/threadpool-test.cc M src/kudu/util/threadpool.h 2 files changed, 70 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/6945/5 -- To view, visit http://gerrit.cloudera.org:8080/6945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37d0bd0a05098bcc1a81ec9f1ac33e74e98781e4 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: consolidate Raft thread pools
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6946 to look at the new patch set (#4). Change subject: consensus: consolidate Raft thread pools .. consensus: consolidate Raft thread pools This patch consolidates the two thread pools used in Raft consensus: the observer and "Raft" (for lack of a better name) pools. The former runs tasks for infrequent events such as term and config changes while the latter is used for periodic events such as processing heartbeat responses. In this patch, each per-replica pool is consolidated into a single per-server pool. Sequence tokens are used to ensure that tasks belonging to a single replica are executed serially and in order. For the "Raft" pool, a single sequence token is shared by the PeerManager and RaftConsensus; this mirrors the existing sharing behavior and is safe because token operations are thread-safe. The non-default max_threads may come as a surprise, but David showed me how tasks submitted to either pool may sometimes lead to an fsync (mostly via cmeta flushes). As such, it isn't safe to use the default num_cpus upper bound, as that may cause such IO-based tasks to block other CPU-only tasks (or worse, periodic tasks like heartbeat response processing). What per-replica threads are not addressed by this patch? - Failure detection thread: a threadpool isn't the right model for these. Something like a hash wheel timer would be ideal. - Prepare thread pool (max_threads=1): these could be consolidated too, but their metrics are per-replica, and sequence tokens don't (yet) support that. Meaning, if they were consolidated now, the metrics would also consolidate and would be less useful as a result. Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54 --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/server/server_base.cc M src/kudu/server/server_base.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/ts_tablet_manager.cc 19 files changed, 189 insertions(+), 117 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/6946/4 -- To view, visit http://gerrit.cloudera.org:8080/6946 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] server: move apply pool into ServerBase
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/6984 to review the following change. Change subject: server: move apply_pool into ServerBase .. server: move apply_pool into ServerBase The server-wide apply_pool was instantiated in two places: SysCatalogTable (for the master) and TsTabletManager (for the tserver). This commit moves it into ServerBase and unifies the instantiation. Note: the apply pool semantics have not changed. Some interesting side effects: 1. The master will now generate apply pool metrics. 2. The apply pool is shut down a little bit later in server shutdown than it was before, though I don't see any issues with this. Change-Id: Ie7ffc886269aa6531517a52fef29c4408a925aed --- M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/sys_catalog.h M src/kudu/server/server_base.cc M src/kudu/server/server_base.h M src/kudu/tserver/tablet_server.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 8 files changed, 48 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/6984/1 -- To view, visit http://gerrit.cloudera.org:8080/6984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie7ffc886269aa6531517a52fef29c4408a925aed Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: consolidate Raft thread pools
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6946 to look at the new patch set (#3). Change subject: consensus: consolidate Raft thread pools .. consensus: consolidate Raft thread pools This patch consolidates the two thread pools used in Raft consensus: the observer and "Raft" (for lack of a better name) pools. The former runs tasks for infrequent events such as term and config changes while the latter is used for periodic events such as processing heartbeat responses. In this patch, each per-replica pool is consolidated into a single per-server pool. Sequence tokens are used to ensure that tasks belonging to a single replica are executed serially and in order. For the "Raft" pool, a single sequence token is shared by the PeerManager and RaftConsensus; this mirrors the existing sharing behavior and is safe because token operations are thread-safe. The non-default max_threads may come as a surprise, but David showed me how tasks submitted to either pool may sometimes lead to an fsync (mostly via cmeta flushes). As such, it isn't safe to use the default num_cpus upper bound, as that may cause such IO-based tasks to block other CPU-only tasks (or worse, periodic tasks like heartbeat response processing). What per-replica threads are not addressed by this patch? - Failure detection thread: a threadpool isn't the right model for these. Something like a hash wheel timer would be ideal. - Prepare thread pool (max_threads=1): these could be consolidated too, but their metrics are per-replica, and sequence tokens don't (yet) support that. Meaning, if they were consolidated now, the metrics would also consolidate and would be less useful as a result. Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54 --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/server/server_base.cc M src/kudu/server/server_base.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/ts_tablet_manager.cc 19 files changed, 189 insertions(+), 117 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/6946/3 -- To view, visit http://gerrit.cloudera.org:8080/6946 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Mike Percy has posted comments on this change. Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1034 client does not failover due to timeout
Mike Percy has posted comments on this change. Change subject: KUDU-1034 client does not failover due to timeout .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2021 retry master RPC if negotiation times out
Mike Percy has posted comments on this change. Change subject: KUDU-2021 retry master RPC if negotiation times out .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/6927/6//COMMIT_MSG Commit Message: Line 7: KUDU-2021 retry master RPC if negotiation times out How about: KUDU-2021 Add regression test for negotiation timeout on Master RPCs http://gerrit.cloudera.org:8080/#/c/6927/6/src/kudu/integration-tests/client_failover-itest.cc File src/kudu/integration-tests/client_failover-itest.cc: Line 484: static const int kTimeoutMs = 1 * 60 * 1000; nit: how about const MonoDelta kTimeout = MonoDelta::FromSeconds(60) Line 488: cluster_opts_.master_rpc_ports = { 37030, 37031, 37032 }; huh. do we have to pick these manually? Line 540: ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms", "500")); It's not obvious to me why a 500ms negotiation delay is sufficient to cause a timeout in this test Line 542: SleepFor(MonoDelta::FromSeconds(3)); What's the purpose of this sleep? Is it so this executes after ListTables() is executed? Is that required for this test to pass? It seems like we should be pausing the LeaderMaster then doing the ListTables immediately. Line 548: ScopedResumeExternalDaemon resumer(m); CHECK_OK(m->Resume()); ? -- To view, visit http://gerrit.cloudera.org:8080/6927 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Alexey Serbin has posted comments on this change. Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/6926/7/src/kudu/rpc/outbound_call.cc File src/kudu/rpc/outbound_call.cc: Line 331: case NEGOTIATION_TIMED_OUT: // fall-through > Hmm it looks like it's only actually used by an internal runtime assertion. Yep, it seems so. As I understand it's fine with the change and also we have coverage for that by all our tests in DEBUG, ASAN and TSAN configurations. -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: consolidate Raft thread pools
Adar Dembo has posted comments on this change. Change subject: consensus: consolidate Raft thread pools .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/consensus_peers.h File src/kudu/consensus/consensus_peers.h: PS2, Line 130: ) > extra parenthesis? Done http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 191: shared_ptr token(new SequenceToken(raft_pool->AllocateTokenSlot())); > I think this should be Done http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/raft_consensus_quorum-test.cc File src/kudu/consensus/raft_consensus_quorum-test.cc: Line 168: new SequenceToken(raft_pool_->AllocateTokenSlot())); > likewise Done -- To view, visit http://gerrit.cloudera.org:8080/6946 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1034 client does not failover due to timeout
Alexey Serbin has posted comments on this change. Change subject: KUDU-1034 client does not failover due to timeout .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/6924/6/src/kudu/integration-tests/client_failover-itest.cc File src/kudu/integration-tests/client_failover-itest.cc: Line 33: #include "kudu/integration-tests/ts_itest-base.h" > no longer used? and the one below Right, this one is not used. But the line below is needed for tserver::ListTabletsResponsePB::StatusAndSchemaPB. I'll update it, thanks! Line 288: SleepFor(MonoDelta::FromMilliseconds(10 * (i + 1))); > warning: either cast from 'int' to 'int64_t' (aka 'long') is ineffective, o Done -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1034 client does not failover due to timeout
Hello Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6924 to look at the new patch set (#7). Change subject: KUDU-1034 client does not failover due to timeout .. KUDU-1034 client does not failover due to timeout This patch fixes the issue described by KUDU-1034: the client does not mark the failed tablet server as 'failed' in case of timeout and continues to use it over and over again to send further requests, even if other tablet replicas might be available. Besides the actual fix, this patch incorporates an integration test (ClientFailoverTServerTimeoutITest.FailoverOnLeaderTimeout) written by Mike. Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 --- M src/kudu/client/meta_cache.h M src/kudu/integration-tests/client_failover-itest.cc M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/rpc/retriable_rpc.h 4 files changed, 125 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6924/7 -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Hello Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6926 to look at the new patch set (#9). Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. KUDU-1580 retry tserver RPC if negotiation times out This patch addresses KUDU-1580, i.e. with this patch the Kudu C++ client retries an RPC with other tablet replica if the connection negotiation with current replica timed out. Added new integration test to cover the updated client's behavior. Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/client_failover-itest.cc M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/rpc/connection.cc M src/kudu/rpc/outbound_call.cc M src/kudu/rpc/outbound_call.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/rpc-test-base.h M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/rpc_controller.cc M src/kudu/rpc/rpc_controller.h M src/kudu/rpc/rpc_introspection.proto 12 files changed, 265 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/6926/9 -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Mike Percy has posted comments on this change. Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. Patch Set 8: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6926/7/src/kudu/rpc/outbound_call.cc File src/kudu/rpc/outbound_call.cc: Line 331: case NEGOTIATION_TIMED_OUT: // fall-through > What exactly do we want to assert? Hmm it looks like it's only actually used by an internal runtime assertion. I guess it's fine. -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] threadpool: token-based task sequencing
Adar Dembo has posted comments on this change. Change subject: threadpool: token-based task sequencing .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/6874/3/src/kudu/util/threadpool-test.cc File src/kudu/util/threadpool-test.cc: Line 419: TEST_F(ThreadPoolTest, TestTokenSubmitsProcessedSerially) { > I think this may fail on a single CPU system. I'm not seeing why; can you elaborate? http://gerrit.cloudera.org:8080/#/c/6874/3/src/kudu/util/threadpool.cc File src/kudu/util/threadpool.cc: PS3, Line 146: std::move > I think you want std::forward here. I can't say I understand the difference between move() and forward() (even after reading an article or two about it), but I've changed both of these moves to be forwards. http://gerrit.cloudera.org:8080/#/c/6874/3/src/kudu/util/threadpool.h File src/kudu/util/threadpool.h: Line 198: DISALLOW_COPY_AND_ASSIGN(SequenceToken); > I was curious about this, so I looked it up in Effective Modern C++ (page 1 I think I will since it adds information for people who aren't familiar with that aspect of move operations. PS3, Line 203: FIFO > nit: missing "queue" Done PS3, Line 220: The execution order will be A1, T1, T2, A2, A3. > this is a _possible_ execution order, right? Dan: A2 isn't allowed to run until A1 has finished. You're right though that if A1 finishes before A2 is submitted, A2 will run second. I'll clarify that. I should have clarified that this execution order presupposes that the tasks are long-running (i.e. that A3 will be submitted before any task finishes). -- To view, visit http://gerrit.cloudera.org:8080/6874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If46dc34212027b6ea5dbc2ead7c7af8be57f2c70 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1034 client does not failover due to timeout
Mike Percy has posted comments on this change. Change subject: KUDU-1034 client does not failover due to timeout .. Patch Set 6: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/6924/6/src/kudu/integration-tests/client_failover-itest.cc File src/kudu/integration-tests/client_failover-itest.cc: Line 33: #include "kudu/integration-tests/ts_itest-base.h" no longer used? and the one below -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1034 client does not failover due to timeout
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6924 to look at the new patch set (#6). Change subject: KUDU-1034 client does not failover due to timeout .. KUDU-1034 client does not failover due to timeout This patch fixes the issue described by KUDU-1034: the client does not mark the failed tablet server as 'failed' in case of timeout and continues to use it over and over again to send further requests, even if other tablet replicas might be available. Besides the actual fix, this patch incorporates an integration test (ClientFailoverTServerTimeoutITest.FailoverOnLeaderTimeout) written by Mike. Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 --- M src/kudu/client/meta_cache.h M src/kudu/integration-tests/client_failover-itest.cc M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/rpc/retriable_rpc.h 4 files changed, 128 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6924/6 -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6926 to look at the new patch set (#8). Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. KUDU-1580 retry tserver RPC if negotiation times out This patch addresses KUDU-1580, i.e. with this patch the Kudu C++ client retries an RPC with other tablet replica if the connection negotiation with current replica timed out. Added new integration test to cover the updated client's behavior. Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/client_failover-itest.cc M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/rpc/connection.cc M src/kudu/rpc/outbound_call.cc M src/kudu/rpc/outbound_call.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/rpc-test-base.h M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/rpc_controller.cc M src/kudu/rpc/rpc_controller.h M src/kudu/rpc/rpc_introspection.proto 12 files changed, 264 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/6926/8 -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Dan Burkert has submitted this change and it was merged. Change subject: KUDU-1988: add support for advertised host:port info. .. KUDU-1988: add support for advertised host:port info. Prior to this change, it was difficult to deploy a Kudu cluster in containerized environments where each container has a unique, local subnetwork with an externally facing translation layer. In these scenarios the network address that the Kudu server binds to is not the externally-routable IP address. To overcome this, a set of new flags are introduced, '--rpc-advertised-addresses' and '--webserver-advertised-addresses' which allow administrators to override the set of addresses which the Kudu server gives to external servers and clients. Similar configurations are present in other distributed systems, for example, Kafka's advertised.host.name property. Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Reviewed-on: http://gerrit.cloudera.org:8080/6827 Reviewed-by: Adar DemboTested-by: Kudu Jenkins --- M src/kudu/master/master.cc M src/kudu/server/CMakeLists.txt A src/kudu/server/rpc_server-test.cc M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/webserver-test.cc M src/kudu/server/webserver.cc M src/kudu/server/webserver.h M src/kudu/server/webserver_options.cc M src/kudu/server/webserver_options.h M src/kudu/tserver/heartbeater.cc 11 files changed, 333 insertions(+), 13 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik Sundberg Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1034 client does not failover due to timeout
Alexey Serbin has posted comments on this change. Change subject: KUDU-1034 client does not failover due to timeout .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h File src/kudu/rpc/retriable_rpc.h: PS4, Line 195: MarkServerFailed > +1 thats ok https://issues.apache.org/jira/browse/KUDU-2024 -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Adar Dembo has posted comments on this change. Change subject: KUDU-1988: add support for advertised host:port info. .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Dan Burkert has posted comments on this change. Change subject: KUDU-1988: add support for advertised host:port info. .. Patch Set 12: (3 comments) http://gerrit.cloudera.org:8080/#/c/6827/12//COMMIT_MSG Commit Message: PS12, Line 10: - > Nit: don't need Done http://gerrit.cloudera.org:8080/#/c/6827/12/src/kudu/server/rpc_server-test.cc File src/kudu/server/rpc_server-test.cc: Line 99: GetAddresses(_addrs, _addrs); > Should wrap these calls in NO_FATALS. Done http://gerrit.cloudera.org:8080/#/c/6827/12/src/kudu/server/webserver-test.cc File src/kudu/server/webserver-test.cc: Line 299: GetAddresses(_addrs, _addrs); > Need to wrap these in NO_FATALS(). Done -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6827 to look at the new patch set (#13). Change subject: KUDU-1988: add support for advertised host:port info. .. KUDU-1988: add support for advertised host:port info. Prior to this change, it was difficult to deploy a Kudu cluster in containerized environments where each container has a unique, local subnetwork with an externally facing translation layer. In these scenarios the network address that the Kudu server binds to is not the externally-routable IP address. To overcome this, a set of new flags are introduced, '--rpc-advertised-addresses' and '--webserver-advertised-addresses' which allow administrators to override the set of addresses which the Kudu server gives to external servers and clients. Similar configurations are present in other distributed systems, for example, Kafka's advertised.host.name property. Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 --- M src/kudu/master/master.cc M src/kudu/server/CMakeLists.txt A src/kudu/server/rpc_server-test.cc M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/webserver-test.cc M src/kudu/server/webserver.cc M src/kudu/server/webserver.h M src/kudu/server/webserver_options.cc M src/kudu/server/webserver_options.h M src/kudu/tserver/heartbeater.cc 11 files changed, 333 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6827/13 -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Adar Dembo has posted comments on this change. Change subject: KUDU-1988: add support for advertised host:port info. .. Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/6827/12//COMMIT_MSG Commit Message: PS12, Line 10: Nit: don't need http://gerrit.cloudera.org:8080/#/c/6827/12/src/kudu/server/rpc_server-test.cc File src/kudu/server/rpc_server-test.cc: Line 99: TEST_F(BoundOnlyWebserverTest, OnlyBoundAddresses) { Should wrap these calls in NO_FATALS. http://gerrit.cloudera.org:8080/#/c/6827/12/src/kudu/server/webserver-test.cc File src/kudu/server/webserver-test.cc: Line 299: ASSERT_OK(server_->GetAdvertisedAddresses(_addrs)); Need to wrap these in NO_FATALS(). -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6827 to look at the new patch set (#12). Change subject: KUDU-1988: add support for advertised host:port info. .. KUDU-1988: add support for advertised host:port info. Prior to this change, it was difficult to deploy a Kudu cluster in containerized-environments where each container has a unique, local subnetwork with an externally facing translation layer. In these scenarios the network address that the Kudu server binds to is not the externally-routable IP address. To overcome this, a set of new flags are introduced, '--rpc-advertised-addresses' and '--webserver-advertised-addresses' which allow administrators to override the set of addresses which the Kudu server gives to external servers and clients. Similar configurations are present in other distributed systems, for example, Kafka's advertised.host.name property. Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 --- M src/kudu/master/master.cc M src/kudu/server/CMakeLists.txt A src/kudu/server/rpc_server-test.cc M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/webserver-test.cc M src/kudu/server/webserver.cc M src/kudu/server/webserver.h M src/kudu/server/webserver_options.cc M src/kudu/server/webserver_options.h M src/kudu/tserver/heartbeater.cc 11 files changed, 333 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6827/12 -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Alexey Serbin has posted comments on this change. Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/6926/7/src/kudu/rpc/outbound_call.cc File src/kudu/rpc/outbound_call.cc: Line 331: case NEGOTIATION_TIMED_OUT: // fall-through > It is possible to assert on this in a test somewhere? What exactly do we want to assert? -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Mike Percy has posted comments on this change. Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/6926/7/src/kudu/rpc/outbound_call.cc File src/kudu/rpc/outbound_call.cc: Line 331: case NEGOTIATION_TIMED_OUT: // fall-through It is possible to assert on this in a test somewhere? http://gerrit.cloudera.org:8080/#/c/6926/6/src/kudu/rpc/rpc_introspection.proto File src/kudu/rpc/rpc_introspection.proto: Line 43: NEGOTIATION_TIMED_OUT = 7; > I suspect the idea was to limit dependencies between components and have mo Hmm ok. I suspect there is no great reason TBH but we can defer on that -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config
Mike Percy has submitted this change and it was merged. Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but still in config .. KUDU-1860: ksck doesn't identify tablets that are evicted but still in config This patch enhances ksck to gather consensus info from every tablet. It compares this info with master and outputs the master's config and every conflicting config, if there are any conflicts. To do this efficiently it reimplements the GetAllConsensusState RPC so that it gathers info about every replica's consensus state. This will catch at least the two problems identified in KUDU-1860: 1. The leader has a pending config to remove a tablet, but it is not committed so the master does not see this config. This can hide an unhealthy tablet if, e.g., one pending config member is down and the pending-to-be-kicked-out member is up, so 1/2 replicas are alive in the leader's active config but the master thinks 2/3 are alive. 2. No replica is leader but the master believes there is a leader because its cache is old and hasn't been updated. Sample output showing #1: https://gist.github.com/wdberkeley/d2606698e4f2e8ca3ef70d4dcef7ba9a Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 Reviewed-on: http://gerrit.cloudera.org:8080/6772 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy--- M src/kudu/consensus/consensus.proto M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote.cc M src/kudu/tools/ksck_remote.h M src/kudu/tools/tool_action_cluster.cc M src/kudu/tserver/tablet_replica_lookup.h M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tablet_service.h M src/kudu/tserver/ts_tablet_manager.h 14 files changed, 521 insertions(+), 71 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1034 client does not failover due to timeout
Mike Percy has posted comments on this change. Change subject: KUDU-1034 client does not failover due to timeout .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h File src/kudu/rpc/retriable_rpc.h: PS4, Line 195: MarkServerFailed > I think we can address that separately -- my current priority to have the b +1 thats ok -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Dan Burkert has posted comments on this change. Change subject: KUDU-1988: add support for advertised host:port info. .. Patch Set 10: (35 comments) http://gerrit.cloudera.org:8080/#/c/6827/10//COMMIT_MSG Commit Message: Line 7: KUDU-1988: add support for advertised host:port info. > Can you add more detail info here? Done http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/master/master.cc File src/kudu/master/master.cc: PS10, Line 241: web_server()->GetAdvertisedAddresses > Does it make sense to check for return code in this case? Done http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/CMakeLists.txt File src/kudu/server/CMakeLists.txt: Line 110: ADD_KUDU_TEST(rpc_server-test) > Nit: resort Done http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server-test.cc File src/kudu/server/rpc_server-test.cc: Line 19: > nit: consider adding at least Done PS10, Line 30: RpcServerAdvertisedAddressesTest() { : } > Nit: can omit Done PS10, Line 38: string advertised = use_advertised_addresses(); > nit: consider moving this after that if (!bind.empty()) { ... } closure Done Line 42: opts.rpc_bind_addresses = "127.0.0.1:9898"; > Can we avoid hardcoding ports in the bound addresses? It's going to make th Done Line 48: gscoped_ptr metric_registry(new MetricRegistry()); > Nit: should use unique_ptr. Done PS10, Line 50: > nit: strange shift (supposed to be 4 spaces) Done Line 53: std::shared_ptr messenger; > Add "using std::shared_ptr" and omit the prefix. Done PS10, Line 55: > nit: strange shift (supposed to be 4 spaces) Done PS10, Line 54: builder.set_num_reactors(1) : .set_min_negotiation_threads(1) : .set_max_negotiation_threads(1) : .set_metric_entity(metric_entity) : .enable_inbound_tls(); > Are all of these non-default settings needed for the test? If not, can you Done Line 59: builder.Build(); > ASSERT_OK on this. Done PS10, Line 60: CHECK_OK(server_->Init(messenger)); : CHECK_OK(server_->Bind()); > It's OK to use ASSERT_OK() in SetUp, I believe. Done PS10, Line 66: const string > nit: what's use to return const by value? Maybe, turn this into 'const str I've dropped the const. Line 69: gscoped_ptr server_; > unique_ptr here too. Done PS10, Line 91: advertised_addrs.size(), 1 > here is elsewhere in this file: the expected value should come first, that Done PS10, Line 89: vector advertised_addrs; : ASSERT_OK(server_->GetAdvertisedAddresses(_addrs)); : ASSERT_EQ(advertised_addrs.size(), 1); : vector bound_addrs; : ASSERT_OK(server_->GetBoundAddresses(_addrs)); > Maybe add a test method that fetches the server's advertised and bound addr Done PS10, Line 94: ASSERT_EQ(advertised_addrs[0].host(), "127.0.0.1"); : ASSERT_EQ(advertised_addrs[0].port(), ); > nit for here and elsewhere in the test: I think for readability it would be This has been refactored a bit from Adar's suggestion, so I don't think I can do this anymore. PS10, Line 96: bound_addrs[0] > does it make sense to add an assert on the size of bound_addrs prior to acc Done http://gerrit.cloudera.org:8080/#/c/6827/8/src/kudu/server/rpc_server.cc File src/kudu/server/rpc_server.cc: Line 117: if (!options_.rpc_advertised_addresses.empty()) { > This should guard against port 0, as on line 111 above. Done Line 210: > Same comment as in the webserver; this can just use the copy ctor. Done http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server.cc File src/kudu/server/rpc_server.cc: Line 51: "Comma-separated list of addresses to advertise externally for RPC connections. " > Would be nice to include a sentence or two about how when to use this and n Done PS10, Line 45: DEFINE_string(rpc_bind_addresses, "0.0.0.0", : "Comma-separated list of addresses to bind to for RPC connections. " : "Currently, ephemeral ports (i.e. port 0) are not allowed."); : TAG_FLAG(rpc_bind_addresses, stable); : : DEFINE_string(rpc_advertised_addresses, "", : "Comma-separated list of addresses to advertise externally for RPC connections. " : "Currently, ephemeral ports (i.e. port 0) are not allowed."); : TAG_FLAG(rpc_advertised_addresses, advanced); > Should advertised addresses be a subset of bind addresses? If yes, conside No, most likely they will be completely separate. PS10, Line 124: LOG(FATAL) << "Advertising an ephemeral port is not supported (RPC advertised address " :<< "configured to " << addr.ToString() << ")"; > We could do this via gflag validation, no? Just look
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6827 to look at the new patch set (#11). Change subject: KUDU-1988: add support for advertised host:port info. .. KUDU-1988: add support for advertised host:port info. Prior to this change, it was difficult to deploy a Kudu cluster in containerized-environments where each container has a unique, local subnetwork with an externally facing translation layer. In these scenarios the network address that the Kudu server binds to is not the externally-routable IP address. To overcome this, a set of new flags are introduced, '--rpc-advertised-addresses' and '--webserver-advertised-addresses' which allow administrators to override the set of addresses which the Kudu server gives to external servers and clients. Similar configurations are present in other distributed systems, for example, Kafka's advertised.host.name property. Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 --- M src/kudu/master/master.cc M src/kudu/server/CMakeLists.txt A src/kudu/server/rpc_server-test.cc M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/webserver-test.cc M src/kudu/server/webserver.cc M src/kudu/server/webserver.h M src/kudu/server/webserver_options.cc M src/kudu/server/webserver_options.h M src/kudu/tserver/heartbeater.cc 11 files changed, 333 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6827/11 -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot
[kudu-CR] docs: clarify that max cell size applies to uncompressed size
David Ribeiro Alves has posted comments on this change. Change subject: docs: clarify that max cell size applies to uncompressed size .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1755 Part 2: Improve table on-disk size metric
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6968 to look at the new patch set (#3). Change subject: KUDU-1755 Part 2: Improve table on-disk size metric .. KUDU-1755 Part 2: Improve table on-disk size metric This adds WAL segment sizes to the tablet on-disk metric. This is a little bit trickier than previous improvements because this info is held by the TabletReplica, not the Tablet, so the on-disk metric itself had to be relocated and plugged in to TabletReplica::EstimateOnDiskSize. Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 --- M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h 3 files changed, 42 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/3 -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1034 client does not failover due to timeout
Alexey Serbin has posted comments on this change. Change subject: KUDU-1034 client does not failover due to timeout .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/integration-tests/client_failover-itest.cc File src/kudu/integration-tests/client_failover-itest.cc: PS4, Line 252: tserver::TabletServerIntegrationTestBase > does FindTabletLeader() do what you need? Excellent -- it looks like exactly what I need. Thanks! http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h File src/kudu/rpc/retriable_rpc.h: Line 167: // A note regarding ServerPicker::MarkServerFailed() calls in this method: > Isn't that just an abstract base class with no other implementations? Can w Probably, yes. However, I would prefer to do that in a separate changelist and involve more people in discussing that. My concern is that we might be missing something looking only at this context. PS4, Line 195: MarkServerFailed > Maybe the confusion we've had about this is that the abstraction here is to I think we can address that separately -- my current priority to have the bug fixed. By my experience, discussing such things usually leads to many iterations with lot of bike-shedding, unfortunately. What do you think if I file a separate JIRA item to track this and we leave this as-is in the current patchset? -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config
Hello Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6772 to look at the new patch set (#11). Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but still in config .. KUDU-1860: ksck doesn't identify tablets that are evicted but still in config This patch enhances ksck to gather consensus info from every tablet. It compares this info with master and outputs the master's config and every conflicting config, if there are any conflicts. To do this efficiently it reimplements the GetAllConsensusState RPC so that it gathers info about every replica's consensus state. This will catch at least the two problems identified in KUDU-1860: 1. The leader has a pending config to remove a tablet, but it is not committed so the master does not see this config. This can hide an unhealthy tablet if, e.g., one pending config member is down and the pending-to-be-kicked-out member is up, so 1/2 replicas are alive in the leader's active config but the master thinks 2/3 are alive. 2. No replica is leader but the master believes there is a leader because its cache is old and hasn't been updated. Sample output showing #1: https://gist.github.com/wdberkeley/d2606698e4f2e8ca3ef70d4dcef7ba9a Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 --- M src/kudu/consensus/consensus.proto M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote.cc M src/kudu/tools/ksck_remote.h M src/kudu/tools/tool_action_cluster.cc M src/kudu/tserver/tablet_replica_lookup.h M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tablet_service.h M src/kudu/tserver/ts_tablet_manager.h 14 files changed, 521 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/6772/11 -- To view, visit http://gerrit.cloudera.org:8080/6772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config
Will Berkeley has posted comments on this change. Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but still in config .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/6772/10/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: PS10, Line 119: are match > extra word: are Done -- To view, visit http://gerrit.cloudera.org:8080/6772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1034 client does not failover due to timeout
Mike Percy has posted comments on this change. Change subject: KUDU-1034 client does not failover due to timeout .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h File src/kudu/rpc/retriable_rpc.h: Line 167: // A note regarding ServerPicker::MarkServerFailed() calls in this method: > The MarkServerFailed signature in ServerPicker assumes it can mark the whol Isn't that just an abstract base class with no other implementations? Can we just redefine the semantics? PS4, Line 195: MarkServerFailed > I'm not sure about this: from the semantics of the error code SERVER_NOT_AC Maybe the confusion we've had about this is that the abstraction here is too general for the problem domain. Maybe it's fine for the server picker to conceptually know about replicas. If you disagree then we can leave it as-is, just trying to think of ways to make it more understandable. -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1034 client does not failover due to timeout
Mike Percy has posted comments on this change. Change subject: KUDU-1034 client does not failover due to timeout .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/integration-tests/client_failover-itest.cc File src/kudu/integration-tests/client_failover-itest.cc: PS4, Line 252: tserver::TabletServerIntegrationTestBase > Because it's using GetLeaderReplicaWithRetries() which currently exists onl does FindTabletLeader() do what you need? -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1034 client does not failover due to timeout
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6924 to look at the new patch set (#5). Change subject: KUDU-1034 client does not failover due to timeout .. KUDU-1034 client does not failover due to timeout This patch fixes the issue described by KUDU-1034: the client does not mark the failed tablet server as 'failed' in case of timeout and continues to use it over and over again to send further requests, even if other tablet replicas might be available. Besides the actual fix, this patch incorporates an integration test (ClientFailoverTServerTimeoutITest.FailoverOnLeaderTimeout) written by Mike. Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 --- M src/kudu/client/meta_cache.h M src/kudu/integration-tests/client_failover-itest.cc M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/rpc/retriable_rpc.h 4 files changed, 77 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6924/5 -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6926 to look at the new patch set (#7). Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. KUDU-1580 retry tserver RPC if negotiation times out This patch addresses KUDU-1580, i.e. with this patch the Kudu C++ client retries an RPC with other tablet replica if the connection negotiation with current replica timed out. Added new integration test to cover the updated client's behavior. Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b --- M src/kudu/client/batcher.cc M src/kudu/integration-tests/client_failover-itest.cc M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/rpc/connection.cc M src/kudu/rpc/outbound_call.cc M src/kudu/rpc/outbound_call.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/rpc-test-base.h M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/rpc_controller.cc M src/kudu/rpc/rpc_controller.h M src/kudu/rpc/rpc_introspection.proto 12 files changed, 264 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/6926/7 -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1034 client does not failover due to timeout
Alexey Serbin has posted comments on this change. Change subject: KUDU-1034 client does not failover due to timeout .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/integration-tests/client_failover-itest.cc File src/kudu/integration-tests/client_failover-itest.cc: PS4, Line 252: tserver::TabletServerIntegrationTestBase > why not ExternalMiniClusterITestBase? Because it's using GetLeaderReplicaWithRetries() which currently exists only in TabletServerIntegrationTestBase Do you think it would be a good idea to move that GetLeaderReplicaWithRetries and all of its dependencies into cluster_itest_util.h or around? http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h File src/kudu/rpc/retriable_rpc.h: Line 167: // A note regarding ServerPicker::MarkServerFailed() calls in this method: > I know I mentioned adding a comment but I was thinking just a 2- or 3-line The MarkServerFailed signature in ServerPicker assumes it can mark the whole server as failed. It's just a detail of current implementation of that method in MetaCacheServerPicker. Do you mean we want to have this comment not here but in MetaCacheServerPicker? PS4, Line 195: MarkServerFailed > Should we rename this method to MarkReplicaFailed for clarity? I'm not sure about this: from the semantics of the error code SERVER_NOT_ACCESSIBLE you would assume it makes sense to mark the whole server failed. As I understand it, teplica is not relevant in this context. The same for the call of the same method in case of TimedOut status returned along with NON_RERTRIABLE_ERROR -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Alexey Serbin has posted comments on this change. Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. Patch Set 6: (4 comments) > (4 comments) > > I'm not sure what the purpose is of NEGOTIATION_TIMED_OUT in the > current patch. The new NEGOTIATION_TIMED_OUT state is to mark an outbound call which timed out on connection negotiation. It's different from timing out on sending data for RPC and awaiting for response. Basically, the idea is to distinguish between errors which happen during connection negotiation and errors that happen while performing a RPC over already established connection. http://gerrit.cloudera.org:8080/#/c/6926/4/src/kudu/rpc/outbound_call.cc File src/kudu/rpc/outbound_call.cc: Line 314: const MonoDelta timeout = controller_->timeout(); > did you mean const& ? not quite so: RpcController::timeout() returns by value, and MonoDelta reference would be the same size as the object itself, so no point to have a reference here. I added 'const' just for readability, to allow the reader to understand this is not going to change in the scope. http://gerrit.cloudera.org:8080/#/c/6926/6/src/kudu/rpc/outbound_call.cc File src/kudu/rpc/outbound_call.cc: Line 330: return state_ == TIMED_OUT; > return state_ == TIMED_OUT || state_ == NEGOTIATION_TIMED_OUT ? I thought about that but was not sure it would work due to the way it's used in CallTransferCallbacks::NotifyTransferFinished I'll give it a second look. http://gerrit.cloudera.org:8080/#/c/6926/4/src/kudu/rpc/outbound_call.h File src/kudu/rpc/outbound_call.h: PS4, Line 124: remove > remote Done http://gerrit.cloudera.org:8080/#/c/6926/6/src/kudu/rpc/rpc_introspection.proto File src/kudu/rpc/rpc_introspection.proto: Line 43: NEGOTIATION_TIMED_OUT = 7; > I don't really understand why these are defined both here and in outbound_c I suspect the idea was to limit dependencies between components and have more flexibility for states of OutboundCall. I just followed the existing practice here. -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config
Mike Percy has posted comments on this change. Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but still in config .. Patch Set 10: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tools/ksck-test.cc File src/kudu/tools/ksck-test.cc: Line 453: " config from master: A* B C Yes \n" > Hm you'd have to actually get the RPC to return duplicates from 1 or more t One way to test this is to put replicas on multiple TSes, shut the TSes down, replace one of the TS UUIDs, and restart the cluster. I see your point about the state of the test framework for KsckTest http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tools/ksck-test.cc File src/kudu/tools/ksck-test.cc: Line 396: TEST_F(KsckTest, TestConsensusConflictExtraPeer) { > This isn't an integration test. I went to go find the tool-itest I thought Oh, yeah. This test is a bit yuck. http://gerrit.cloudera.org:8080/#/c/6772/10/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: PS10, Line 119: are match extra word: are -- To view, visit http://gerrit.cloudera.org:8080/6772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1580 retry tserver RPC if negotiation times out
Mike Percy has posted comments on this change. Change subject: KUDU-1580 retry tserver RPC if negotiation times out .. Patch Set 6: (4 comments) I'm not sure what the purpose is of NEGOTIATION_TIMED_OUT in the current patch. http://gerrit.cloudera.org:8080/#/c/6926/4/src/kudu/rpc/outbound_call.cc File src/kudu/rpc/outbound_call.cc: Line 314: const MonoDelta timeout = controller_->timeout(); did you mean const& ? http://gerrit.cloudera.org:8080/#/c/6926/6/src/kudu/rpc/outbound_call.cc File src/kudu/rpc/outbound_call.cc: Line 330: return state_ == TIMED_OUT; return state_ == TIMED_OUT || state_ == NEGOTIATION_TIMED_OUT ? http://gerrit.cloudera.org:8080/#/c/6926/4/src/kudu/rpc/outbound_call.h File src/kudu/rpc/outbound_call.h: PS4, Line 124: remove remote http://gerrit.cloudera.org:8080/#/c/6926/6/src/kudu/rpc/rpc_introspection.proto File src/kudu/rpc/rpc_introspection.proto: Line 43: NEGOTIATION_TIMED_OUT = 7; I don't really understand why these are defined both here and in outbound_call.h, can't we just use these ones everywhere? -- To view, visit http://gerrit.cloudera.org:8080/6926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icee8bf4978365a23d6627e7bc411b63f53540a3b Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Alexey Serbin has posted comments on this change. Change subject: KUDU-1988: add support for advertised host:port info. .. Patch Set 10: (10 comments) http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/master/master.cc File src/kudu/master/master.cc: PS10, Line 241: web_server()->GetAdvertisedAddresses Does it make sense to check for return code in this case? http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server-test.cc File src/kudu/server/rpc_server-test.cc: Line 19: nit: consider adding at least #include #include #include #include "kudu/util/net/sockaddr.h" PS10, Line 38: string advertised = use_advertised_addresses(); nit: consider moving this after that if (!bind.empty()) { ... } closure PS10, Line 50: nit: strange shift (supposed to be 4 spaces) PS10, Line 55: nit: strange shift (supposed to be 4 spaces) PS10, Line 66: const string nit: what's use to return const by value? Maybe, turn this into 'const string&' ? or 'const char*' ? PS10, Line 91: advertised_addrs.size(), 1 here is elsewhere in this file: the expected value should come first, that really helps when reading output when assertion fires. PS10, Line 94: ASSERT_EQ(advertised_addrs[0].host(), "127.0.0.1"); : ASSERT_EQ(advertised_addrs[0].port(), ); nit for here and elsewhere in the test: I think for readability it would be better to place those asserts close to the place where advertised_addrs are set, i.e. just after ASSERT_EQ(1, advertised_addrs.size()); PS10, Line 96: bound_addrs[0] does it make sense to add an assert on the size of bound_addrs prior to accessing its first element? http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server.cc File src/kudu/server/rpc_server.cc: PS10, Line 45: DEFINE_string(rpc_bind_addresses, "0.0.0.0", : "Comma-separated list of addresses to bind to for RPC connections. " : "Currently, ephemeral ports (i.e. port 0) are not allowed."); : TAG_FLAG(rpc_bind_addresses, stable); : : DEFINE_string(rpc_advertised_addresses, "", : "Comma-separated list of addresses to advertise externally for RPC connections. " : "Currently, ephemeral ports (i.e. port 0) are not allowed."); : TAG_FLAG(rpc_advertised_addresses, advanced); Should advertised addresses be a subset of bind addresses? If yes, consider adding a group validator on that. -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1860: ksck doesn't identify tablets that are evicted but still in config
Will Berkeley has posted comments on this change. Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but still in config .. Patch Set 7: (8 comments) Both of your comments about more testing are things that aren't done well by the current ksck tests. I think the takeaway is there should by a ksck- or tool-itest to cover these situations. It would also be useful as a place to see what situations ksck should detect. http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tools/ksck-test.cc File src/kudu/tools/ksck-test.cc: Line 453: > Sounds good. I'm just wondering if we can make a test case for that scenari Hm you'd have to actually get the RPC to return duplicates from 1 or more ts. Another reason to add a tool itest like I mentioned in another comment. http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tools/ksck-test.cc File src/kudu/tools/ksck-test.cc: Line 396: > Can we also add a test case with Committed == no? This isn't an integration test. I went to go find the tool-itest I thought would exist and there isn't one. Looks like a todo: add a tool-itest or a ksck-itest. http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: PS9, Line 604: string tabl > nit: how about ReplicaInfo since we have an important consensus class calle Done Line 612: boost::optional status_pb; > nit: Your code comments should all end with punctuation per https://google. Done Line 672: repl_leader_uuid, > not used? also missing ending ] Done http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: Line 123: } > When I see operator== I think that it's only going to return true for struc Done http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tserver/tablet_replica_lookup.h File src/kudu/tserver/tablet_replica_lookup.h: Line 54: > Add comment: Appends all non-tombstoned tablet replicas to the 'replicas' v Done http://gerrit.cloudera.org:8080/#/c/6772/9/src/kudu/tserver/tablet_service.h File src/kudu/tserver/tablet_service.h: Line 183: consensus::GetConsensusStateResponsePB* resp, > nit: Indentation should look like GetLastOpId() Fixed all the off indents in this section. -- To view, visit http://gerrit.cloudera.org:8080/6772 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1034 client does not failover due to timeout
Mike Percy has posted comments on this change. Change subject: KUDU-1034 client does not failover due to timeout .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/integration-tests/client_failover-itest.cc File src/kudu/integration-tests/client_failover-itest.cc: PS4, Line 252: tserver::TabletServerIntegrationTestBase why not ExternalMiniClusterITestBase? I'm trying to standardize on that one for ExternalMiniCluster tests http://gerrit.cloudera.org:8080/#/c/6924/4/src/kudu/rpc/retriable_rpc.h File src/kudu/rpc/retriable_rpc.h: Line 167: // A note regarding ServerPicker::MarkServerFailed() calls in this method: I know I mentioned adding a comment but I was thinking just a 2- or 3-line comment in the header file for the declaration of MarkServerFailed. PS4, Line 195: MarkServerFailed Should we rename this method to MarkReplicaFailed for clarity? -- To view, visit http://gerrit.cloudera.org:8080/6924 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Fix flaky test TestRestartWithOrphanedReplicates
Mike Percy has posted comments on this change. Change subject: Fix flaky test TestRestartWithOrphanedReplicates .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6976/1//COMMIT_MSG Commit Message: PS1, Line 12: 1 failure per 1000 > Yea, let me stash it on ve0518 and I can do a rerun. You shouldn't need to do that, you can see your historical runs at http://dist-test.cloudera.org -- To view, visit http://gerrit.cloudera.org:8080/6976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] Fix flaky test TestRestartWithOrphanedReplicates
Edward Fancher has posted comments on this change. Change subject: Fix flaky test TestRestartWithOrphanedReplicates .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/6976/2/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: Line 97: // 1. Setup > I don't think this comment really adds any value to this test. I'll remove it. Line 108: cluster_->SetFlag(cluster_->tablet_server(0), > I think a better comment here would be: Ok. I'll replace it. Line 121: // 3. Validate > I think a better comment here would be: Ok, I'll replace it. -- To view, visit http://gerrit.cloudera.org:8080/6976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] Fix flaky test TestRestartWithOrphanedReplicates
Edward Fancher has posted comments on this change. Change subject: Fix flaky test TestRestartWithOrphanedReplicates .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/6976/1//COMMIT_MSG Commit Message: PS1, Line 7: Fix flaky test > it would be nice to add name of the test here Ok, will do. PS1, Line 9: Looks like the test was setting a fault injection flag before the setup : which was causing the setup to fail > It would be nice to add the error message that the failed setup output into Ok, will do. PS1, Line 12: 1 failure per 1000 > Is it possible to add a link to one of those 1-in-1K dist-test failed runs Yea, let me stash it on ve0518 and I can do a rerun. -- To view, visit http://gerrit.cloudera.org:8080/6976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Adar Dembo has posted comments on this change. Change subject: KUDU-1988: add support for advertised host:port info. .. Patch Set 10: (19 comments) http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/CMakeLists.txt File src/kudu/server/CMakeLists.txt: Line 110: ADD_KUDU_TEST(rpc_server-test) Nit: resort http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server-test.cc File src/kudu/server/rpc_server-test.cc: PS10, Line 30: RpcServerAdvertisedAddressesTest() { : } Nit: can omit Line 42: opts.rpc_bind_addresses = "127.0.0.1:9898"; Can we avoid hardcoding ports in the bound addresses? It's going to make the test flaky. Can we test ephemeral ports only? Line 48: gscoped_ptr metric_registry(new MetricRegistry()); Nit: should use unique_ptr. Line 53: std::shared_ptr messenger; Add "using std::shared_ptr" and omit the prefix. PS10, Line 54: builder.set_num_reactors(1) : .set_min_negotiation_threads(1) : .set_max_negotiation_threads(1) : .set_metric_entity(metric_entity) : .enable_inbound_tls(); Are all of these non-default settings needed for the test? If not, can you remove them? If so, can you add a little comment for each explaining why? Line 59: builder.Build(); ASSERT_OK on this. PS10, Line 60: CHECK_OK(server_->Init(messenger)); : CHECK_OK(server_->Bind()); It's OK to use ASSERT_OK() in SetUp, I believe. Line 69: gscoped_ptr server_; unique_ptr here too. PS10, Line 89: vector advertised_addrs; : ASSERT_OK(server_->GetAdvertisedAddresses(_addrs)); : ASSERT_EQ(advertised_addrs.size(), 1); : vector bound_addrs; : ASSERT_OK(server_->GetBoundAddresses(_addrs)); Maybe add a test method that fetches the server's advertised and bound addresses in one go? http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/rpc_server.cc File src/kudu/server/rpc_server.cc: Line 51: "Comma-separated list of addresses to advertise externally for RPC connections. " Would be nice to include a sentence or two about how when to use this and not just rely on rpc_bind_addresses. PS10, Line 124: LOG(FATAL) << "Advertising an ephemeral port is not supported (RPC advertised address " :<< "configured to " << addr.ToString() << ")"; We could do this via gflag validation, no? Just look for a :0 at the end of an address? http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/webserver-test.cc File src/kudu/server/webserver-test.cc: Line 241: static_dir_ = GetTestPath("webserver-docroot"); This isn't used by any tests; can we just create it inline with opts.doc_root = ... ? Line 265: AddDefaultPathHandlers(server_.get()); Is this necessary? We're not actually accessing the webserver. Line 287: int32 use_webserver_port() const override { return ; } Again, concerned about the flakiness of hard-coding a port here. PS10, Line 298: vector advertised_addrs; : ASSERT_OK(server_->GetAdvertisedAddresses(_addrs)); : ASSERT_EQ(advertised_addrs.size(), 1); : vector bound_addrs; : ASSERT_OK(server_->GetBoundAddresses(_addrs)); Add a test method to fetch both sets of addresses? http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/webserver.cc File src/kudu/server/webserver.cc: PS10, Line 220: return Status::InvalidArgument("advertising an ephemeral webserver port is not supported", :addr.ToString()); Again, let's validate this with a gflag validator. http://gerrit.cloudera.org:8080/#/c/6827/10/src/kudu/server/webserver_options.cc File src/kudu/server/webserver_options.cc: Line 46: DEFINE_string(webserver_advertised_addresses, "", For consistency, shouldn't this be named webserver_advertised_interface? Line 47: "Addresses to advertise externally for web connections." Would be nice to add a sentence or two explaining when to use this in lieu of webserver_interface. -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] Fix flaky test TestRestartWithOrphanedReplicates
Mike Percy has posted comments on this change. Change subject: Fix flaky test TestRestartWithOrphanedReplicates .. Patch Set 2: Code-Review+1 (3 comments) Change looks good, I have a few nitpicky comments about, well, comments. http://gerrit.cloudera.org:8080/#/c/6976/2/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: Line 97: // 1. Setup I don't think this comment really adds any value to this test. Line 108: cluster_->SetFlag(cluster_->tablet_server(0), I think a better comment here would be: // Crash when the WAL contains a replicate message but no corresponding commit. Line 121: // 3. Validate I think a better comment here would be: // Restart the server and check to make sure that the change is eventually applied. -- To view, visit http://gerrit.cloudera.org:8080/6976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Hao Hao has posted comments on this change. Change subject: KUDU-1988: add support for advertised host:port info. .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/6827/10//COMMIT_MSG Commit Message: Line 7: KUDU-1988: add support for advertised host:port info. Can you add more detail info here? -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] Fix flaky test TestRestartWithOrphanedReplicates
Edward Fancher has uploaded a new patch set (#2). Change subject: Fix flaky test TestRestartWithOrphanedReplicates .. Fix flaky test TestRestartWithOrphanedReplicates Looks like the test was setting a fault injection flag before the setup which was causing the setup to fail. Moving the flag setting to after the setup, but before Start seems to have done the trick. We were getting about 1 failure per 1000 runs before this change, 0 per 1000 after. Dist test job: http://dist-test.cloudera.org/job?job_id=efan.1495643022.19373 Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 --- M src/kudu/integration-tests/ts_recovery-itest.cc 1 file changed, 7 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/6976/2 -- To view, visit http://gerrit.cloudera.org:8080/6976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-1755 Part 2: Improve table on-disk size metric
Will Berkeley has posted comments on this change. Change subject: KUDU-1755 Part 2: Improve table on-disk size metric .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6968/1/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: Line 91: using consensus::Consensus; > warning: using decl 'Consensus' is unused [misc-unused-using-decls] Done -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1755 Part 2: Improve table on-disk size metric
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6968 to look at the new patch set (#2). Change subject: KUDU-1755 Part 2: Improve table on-disk size metric .. KUDU-1755 Part 2: Improve table on-disk size metric This adds WAL segment sizes to the tablet on-disk metric. This is a little bit trickier than previous improvements because this info is held by the TabletReplica, not the Tablet, so the on-disk metric itself had to be relocated and plugged in to TabletReplica::EstimateOnDiskSize. Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 --- M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h 3 files changed, 39 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/2 -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley
[kudu-CR] Fix flaky test
Alexey Serbin has posted comments on this change. Change subject: Fix flaky test .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/6976/1//COMMIT_MSG Commit Message: PS1, Line 7: Fix flaky test it would be nice to add name of the test here PS1, Line 9: Looks like the test was setting a fault injection flag before the setup : which was causing the setup to fail It would be nice to add the error message that the failed setup output into the log. PS1, Line 12: 1 failure per 1000 Is it possible to add a link to one of those 1-in-1K dist-test failed runs as well? -- To view, visit http://gerrit.cloudera.org:8080/6976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] docs: clarify that max cell size applies to uncompressed size
Dan Burkert has posted comments on this change. Change subject: docs: clarify that max cell size applies to uncompressed size .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6977/2/docs/known_issues.adoc File docs/known_issues.adoc: PS2, Line 68: ncodingn > encoding Done -- To view, visit http://gerrit.cloudera.org:8080/6977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] docs: clarify that max cell size applies to uncompressed size
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6977 to look at the new patch set (#3). Change subject: docs: clarify that max cell size applies to uncompressed size .. docs: clarify that max cell size applies to uncompressed size Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532 --- M docs/known_issues.adoc M docs/schema_design.adoc 2 files changed, 8 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/6977/3 -- To view, visit http://gerrit.cloudera.org:8080/6977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] docs: clarify that max cell size applies to uncompressed size
Dan Burkert has posted comments on this change. Change subject: docs: clarify that max cell size applies to uncompressed size .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6977/2/docs/known_issues.adoc File docs/known_issues.adoc: PS2, Line 68: ncodingn encoding -- To view, visit http://gerrit.cloudera.org:8080/6977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] docs: clarify that max cell size applies to uncompressed size
Hello Ambreen Kazi, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/6977 to review the following change. Change subject: docs: clarify that max cell size applies to uncompressed size .. docs: clarify that max cell size applies to uncompressed size Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532 --- M docs/schema_design.adoc 1 file changed, 7 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/6977/1 -- To view, visit http://gerrit.cloudera.org:8080/6977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6522fb625138191b0ff8ec295245c27d83a5c532 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6827 to look at the new patch set (#10). Change subject: KUDU-1988: add support for advertised host:port info. .. KUDU-1988: add support for advertised host:port info. Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 --- M src/kudu/master/master.cc M src/kudu/server/CMakeLists.txt A src/kudu/server/rpc_server-test.cc M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/webserver-test.cc M src/kudu/server/webserver.cc M src/kudu/server/webserver.h M src/kudu/server/webserver_options.cc M src/kudu/server/webserver_options.h M src/kudu/tserver/heartbeater.cc 11 files changed, 323 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6827/10 -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot
[kudu-CR] Fix flaky test
Edward Fancher has restored this change. Change subject: Fix flaky test .. Restored -- To view, visit http://gerrit.cloudera.org:8080/6976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: restore Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-871 (part 1). Refactor to make cmeta a first class object
Alexey Serbin has posted comments on this change. Change subject: KUDU-871 (part 1). Refactor to make cmeta a first class object .. Patch Set 1: (6 comments) some nits while skimming through http://gerrit.cloudera.org:8080/#/c/6957/1//COMMIT_MSG Commit Message: PS1, Line 35: Remove unused Consensus::State enum. nit: maybe, it makes sense to publish this change as separate tiny clean-up changelist? PS1, Line 37: * bug fix: RaftConsensus must hold lock before snoozing FD nit: if it's a bug fix, would it make sense to separate it and make sure all current tests pass before introducing such a major refactoring? Or it becomes a bug only when combined with the new refactored code? http://gerrit.cloudera.org:8080/#/c/6957/1/src/kudu/tablet/metadata.proto File src/kudu/tablet/metadata.proto: PS1, Line 77: TABLET_DATA_FRESH Given the description I would expect this is a broader counterpart of TABLET_DATA_COPYING. Is that true or there are some additional nits regarding voting, etc.? http://gerrit.cloudera.org:8080/#/c/6957/1/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS1, Line 244: $0Unable typo? it seems space is missing PS1, Line 593: *tablet_replica = tmp_replica; Would it make sense to allow the 'tablet_replica' parameter be nullptr? It seems in a couple of places in this file the result tablet replica output by this method is not used at this method's call sites. PS1, Line 1064: $0Unable typo? -- To view, visit http://gerrit.cloudera.org:8080/6957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia27a091d27b3996d37009d5ec866e744f9608388 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6827 to look at the new patch set (#9). Change subject: KUDU-1988: add support for advertised host:port info. .. KUDU-1988: add support for advertised host:port info. Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 --- M src/kudu/master/master.cc M src/kudu/server/CMakeLists.txt A src/kudu/server/rpc_server-test.cc M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/webserver-test.cc M src/kudu/server/webserver.cc M src/kudu/server/webserver.h M src/kudu/server/webserver_options.cc M src/kudu/server/webserver_options.h M src/kudu/tserver/heartbeater.cc 11 files changed, 317 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6827/9 -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot
[kudu-CR] Fix flaky test
Edward Fancher has abandoned this change. Change subject: Fix flaky test .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/6976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Kudu Jenkins
[kudu-CR] Fix flaky test
Edward Fancher has uploaded a new change for review. http://gerrit.cloudera.org:8080/6976 Change subject: Fix flaky test .. Fix flaky test Looks like the test was setting a fault injection flag before the setup which was causing the setup to fail. Moving the flag setting to after the setup, but before Start seems to have done the trick. We were getting about 1 failure per 1000 runs before this change, 0 per 1000 after. Dist test job: http://dist-test.cloudera.org/job?job_id=efan.1495643022.19373 Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 --- M src/kudu/integration-tests/ts_recovery-itest.cc 1 file changed, 7 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/6976/1 -- To view, visit http://gerrit.cloudera.org:8080/6976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward Fancher
[kudu-CR] KUDU-1988: add support for advertised host:port info.
Dan Burkert has posted comments on this change. Change subject: KUDU-1988: add support for advertised host:port info. .. Patch Set 8: Patrik indicated he's tied up today, and I want to make sure this lands in time for 1.4, so I'm going to push the fixups today. -- To view, visit http://gerrit.cloudera.org:8080/6827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6735ca5630fc4c426bf72d0b21d6ef452173a890 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Patrik SundbergGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Patrik Sundberg Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1755 Part 1: Improve tablet on disk size metric
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6967 to look at the new patch set (#2). Change subject: KUDU-1755 Part 1: Improve tablet on disk size metric .. KUDU-1755 Part 1: Improve tablet on disk size metric This adds bloomfile, ad hoc index, and superblock sizes to the on-disk size metric for tablets. A follow up will address log segments and cmeta. Change-Id: I32dce598bbb8e18325210a49fc436fd0f7ac68fd --- M src/kudu/cfile/bloomfile-test.cc M src/kudu/cfile/bloomfile.h M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/rowset_info.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_history_gc-test.cc M src/kudu/tablet/tablet_metadata-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica.h M src/kudu/util/metrics.h 14 files changed, 87 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/6967/2 -- To view, visit http://gerrit.cloudera.org:8080/6967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32dce598bbb8e18325210a49fc436fd0f7ac68fd Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Alexey Serbin has posted comments on this change. Change subject: KUDU-1952 Remove round-robin for block placement .. Patch Set 35: (28 comments) http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/cfile/bloomfile-test-base.h File src/kudu/cfile/bloomfile-test-base.h: PS35, Line 74: fs::CreateBlockOptions() nit: would just '{}' be enough? http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: PS35, Line 218: CreateBlockOptions() nit here and below: would '{}' fit in here? http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-stress-test.cc File src/kudu/fs/block_manager-stress-test.cc: PS35, Line 260: this-> is 'this' necessary here? http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS35, Line 137: Status nit: consider adding 'static' specifier -- as I see this method does not depend on the state of the object. PS35, Line 155: _count nit: why not to pass 'num_files' itself as an argument? http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs-test.cc File src/kudu/fs/data_dirs-test.cc: Line 17: nit: please include at least and since they are used in the code. Line 18: #include "kudu/fs/fs.pb.h" nit: please include and PS35, Line 55: CHECK_OK nit: why not just ASSERT_OK() ? PS35, Line 56: CHECK_OK ditto PS35, Line 69: string test_tablet_name_; nit: I don't see it's changing during the tests, so consider adding 'const'. PS35, Line 70: CreateBlockOptions test_block_opts_ const? PS35, Line 72: DataDirGroupPB pb_; nit: is it really needed as a member? Would local variable be enough where needed? PS35, Line 80: ASSERT_TRUE(s.IsNotFound()); Here and everywhere in the tests consider adding s.ToString() in case if it fails, that helps a lot other people during troubleshooting if this ever fails: ASSERT_TRUE(s.IsNotFound()) << s.ToString(); PS35, Line 96: for (int i = 0; i < pb.uuids().size(); i++) { Would it make sense to compare pb.uuids.size() and pb_.uuids.size() prior to comparing elements in this for() loop? PS35, Line 102: dd Does it make sense to check for invariants on 'dd' after this call? What are the expected side-effects after successful call of GetNextDataDir() for 'dd'? PS35, Line 120: DataDir* dd; : ASSERT_OK(dd_manager_->GetNextDataDir(test_block_opts_, )); : dd_manager_->DeleteDataDirGroup(test_tablet_name_); : Status s = dd_manager_->GetNextDataDir(test_block_opts_, ); Is there anything specific for 'dd' in the course of there calls? If yes, consider adding corresponding assertions. PS35, Line 129: FLAGS_fs_data_dirs_full_disk_cache_seconds = 0; If changing this flag on-the-fly, does it make sense to add the 'runtime' tag for the flag? PS35, Line 235: FindOrDie nit: since this is a test, it's possible to use something like ... x = FindOrNull(...); ASSERT_NE(nullptr, x); http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: PS35, Line 449: FLAGS_fs_target_data_dirs_per_tablet A paranoid nit: what if FLAGS_fs_target_data_dirs_per_tablet is set to 2^31 ? Consider either adding a validator for the flag or using unsigned int for group_target_size or enforcing consistency by some other means. http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: PS35, Line 87: uuid_indices nit: consider adding 'const' specified for this method. PS35, Line 225: void nit: would the caller ever be interested to know if any group were effectively deleted or not? http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/diskrowset.cc File src/kudu/tablet/diskrowset.cc: PS35, Line 93: string tablet_id const string& tablet_id ? Line 112: string tablet_id = rowset_metadata_->tablet_metadata()->tablet_id(); ditto PS35, Line 127: string tablet_id ditto http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/multi_column_writer.cc File src/kudu/tablet/multi_column_writer.cc: Line 52: CreateBlockOptions block_opts({ tablet_id_ }); nit: const? http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/multi_column_writer.h File src/kudu/tablet/multi_column_writer.h: PS35, Line 20: #include nit: move this after the std headers. PS35, Line 90: std::string tablet_id_; nit: const? http://gerrit.cloudera.org:8080/#/c/6636/35/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: PS35, Line 629: GetDataDirGroupPBForTablet Is it worth to check for the return value of GetDataDirGroupPBForTablet()? -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 35 Gerrit-Project: kudu Gerrit-Branch: master
[kudu-CR] KUDU-1125 (part 1) catalog manager: try to avoid unnecessarily rewriting tablet info
Adar Dembo has posted comments on this change. Change subject: KUDU-1125 (part 1) catalog_manager: try to avoid unnecessarily rewriting tablet info .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/6916/3/src/kudu/master/sys_catalog.cc File src/kudu/master/sys_catalog.cc: Line 114: return md.Compare(prev_pb, new_pb); I presume this is equivalent to prev_pb == new_pb? But you're going through the MessageDifferencer for the diff_str!= null case? Bearing in mind that the diff_str!=null case is very rare (VLOG only), would it be faster to use equality? Or are the two approaches equivalent? Line 692: VLOG(2) << "Adding tablet " << tablet->tablet_id() << " in catalog: " Any particular reason you want this VLOG but not one for ReqAddTable()? http://gerrit.cloudera.org:8080/#/c/6916/3/src/kudu/master/sys_catalog.h File src/kudu/master/sys_catalog.h: Line 164: const scoped_refptr& tablet_replica() const { Since this is only being used in a test, an alternative would be to keep it private and add FRIEND_TEST() as needed. -- To view, visit http://gerrit.cloudera.org:8080/6916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0de7189b8f1dbeea55172929396b73fd92fd62ba Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes