[kudu-CR] [kudu-jepsen] added more info on troubleshooting
Alexey Serbin has submitted this change and it was merged. Change subject: [kudu-jepsen] added more info on troubleshooting .. [kudu-jepsen] added more info on troubleshooting Added more information on distinguishing 'errors' from 'failures' in the kudu-jepsen test output. Change-Id: I9b97b744d969b73ede2fcb7a3509915b130c655b Reviewed-on: http://gerrit.cloudera.org:8080/6774 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves--- M java/kudu-jepsen/README.adoc 1 file changed, 30 insertions(+), 7 deletions(-) Approvals: David Ribeiro Alves: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9b97b744d969b73ede2fcb7a3509915b130c655b Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [kudu-jepsen] added more info on troubleshooting
David Ribeiro Alves has posted comments on this change. Change subject: [kudu-jepsen] added more info on troubleshooting .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b97b744d969b73ede2fcb7a3509915b130c655b Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-1949. Maintenance Manager should trigger flushes earlier
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1949. Maintenance Manager should trigger flushes earlier .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8b7e703c82ac14fbce3a699bdf6a2f0fb4ed93a1 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1970: node density integration test
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1970: node density integration test .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6662/9/src/kudu/integration-tests/dense_node-itest.cc File src/kudu/integration-tests/dense_node-itest.cc: PS9, Line 71: DenseNodeTest > I don't think a new convention is necessary, and FWIW I see enough overlap right, hence the "most". the point is: do you agree that it would be nice to stress only or bench only tests have a bench suffix or not? fwiw IMO it would be handy to be able to easily identify the binaries for benchmarks among the others when looking for ways to measure something. if not I'm ok with that too, just don't want to avoid making things better because there is no precedent. -- To view, visit http://gerrit.cloudera.org:8080/6662 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9b5d01557eb41d386ce92f576ed01ec658e8e7d Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Fix bug in incorrect response rebuilding on tablet bootstrap
David Ribeiro Alves has posted comments on this change. Change subject: Fix bug in incorrect response rebuilding on tablet bootstrap .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/5489/5/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: PS5, Line 264: const TxResultPB _result, : TxResultPB *new_result, : WriteResponsePB *response, : bool *all_skipped); > nit: the *s and should go with the type, not the var Done PS5, Line 1259: const TxResultPB _result, : TxResultPB *new_result, : WriteResponsePB *response, : bool *all_skipped) > style: same as above Done -- To view, visit http://gerrit.cloudera.org:8080/5489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1219ed5f7835e93cd7f3b128cedd650fc3384482 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Fix bug in incorrect response rebuilding on tablet bootstrap
David Ribeiro Alves has uploaded a new patch set (#6). Change subject: Fix bug in incorrect response rebuilding on tablet bootstrap .. Fix bug in incorrect response rebuilding on tablet bootstrap This fixes the bug described in: https://gerrit.cloudera.org/#/c/5417/ ... and enables the test disabled in that patch. Along the way this also performs some cleanup of tablet bootstrap. I ran raft_consensus-itest's TestInsertDuplicateKeysWithCrashyNodes on dist-test for 5000 loops with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 5000 -- bin/raft_consensus-itest \ --gtest_filter=*DuplicateKeysWithCrashyNodes Prior to this patch the test failed 62/5000: http://dist-test.cloudera.org//job?job_id=david.alves.1493915326.763 After this patch the test passes 5000/5000: http://dist-test.cloudera.org//job?job_id=david.alves.1493914745.27867 Change-Id: I1219ed5f7835e93cd7f3b128cedd650fc3384482 --- M src/kudu/tablet/row_op.cc M src/kudu/tablet/row_op.h M src/kudu/tablet/tablet.proto M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tserver/tablet_server-test.cc 5 files changed, 122 insertions(+), 114 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/5489/6 -- To view, visit http://gerrit.cloudera.org:8080/5489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1219ed5f7835e93cd7f3b128cedd650fc3384482 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-1.3.x) KUDU-1993: fixed validation of 'grouped' gflags
Hello Adar Dembo, Kudu Jenkins, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/6806 to review the following change. Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. KUDU-1993: fixed validation of 'grouped' gflags Added generic implementation for grouped gflags validators. Use CUSTOM_FLAG_VALIDATOR() macro to register late-phase validator function for command-line flags. The validation is performed upon call of HandleCommonFlags() invoked by the top-level ParseCommandLineFlags() function. Added unit test to cover the new functionality. Updated validators for security-related RPC and embedded webserver flags. As a workaround for LSAN reports on the leaks in case of ASAN build, added two scoped leak check disablers into CreateAndStartTimeoutThread. That does not seem harmful or hiding any potential leaks since it affects only the timeout thread itself. Opened separate JIRA to track the (false positive?) leak warning: KUDU-1995. Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Reviewed-on: http://gerrit.cloudera.org:8080/6795 Reviewed-by: Adar DemboTested-by: Kudu Jenkins (cherry picked from commit e7334c2e6607ac0fa778499eda2df3f4cfcd3fe3) --- M src/kudu/rpc/messenger.cc M src/kudu/server/webserver_options.cc M src/kudu/util/CMakeLists.txt A src/kudu/util/flag_validators-test.cc A src/kudu/util/flag_validators.cc A src/kudu/util/flag_validators.h M src/kudu/util/flags.cc M src/kudu/util/test_main.cc 8 files changed, 415 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/6806/1 -- To view, visit http://gerrit.cloudera.org:8080/6806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Alexey Serbin has submitted this change and it was merged. Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. KUDU-1993: fixed validation of 'grouped' gflags Added generic implementation for grouped gflags validators. Use CUSTOM_FLAG_VALIDATOR() macro to register late-phase validator function for command-line flags. The validation is performed upon call of HandleCommonFlags() invoked by the top-level ParseCommandLineFlags() function. Added unit test to cover the new functionality. Updated validators for security-related RPC and embedded webserver flags. As a workaround for LSAN reports on the leaks in case of ASAN build, added two scoped leak check disablers into CreateAndStartTimeoutThread. That does not seem harmful or hiding any potential leaks since it affects only the timeout thread itself. Opened separate JIRA to track the (false positive?) leak warning: KUDU-1995. Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Reviewed-on: http://gerrit.cloudera.org:8080/6795 Reviewed-by: Adar DemboTested-by: Kudu Jenkins --- M src/kudu/rpc/messenger.cc M src/kudu/server/webserver_options.cc M src/kudu/util/CMakeLists.txt A src/kudu/util/flag_validators-test.cc A src/kudu/util/flag_validators.cc A src/kudu/util/flag_validators.h M src/kudu/util/flags.cc M src/kudu/util/test_main.cc 8 files changed, 415 insertions(+), 36 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1927 [master] always send IPKI CA certificate
Alexey Serbin has posted comments on this change. Change subject: KUDU-1927 [master] always send IPKI CA certificate .. Patch Set 6: > Looking back over old JIRAs. This is still relevant, right Alexey? > Just waiting on review? I think it's still relevant, but since we've got ERROR_UNAVAILABLE error code, we could use it as an option here. -- To view, visit http://gerrit.cloudera.org:8080/6356 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a20e6da6531091fcc597af58a7bbf01a8847f5 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR](branch-1.3.x) KUDU-1981 Kudu should run at hosts len(FQDN) > 64
Alexey Serbin has submitted this change and it was merged. Change subject: KUDU-1981 Kudu should run at hosts len(FQDN) > 64 .. KUDU-1981 Kudu should run at hosts len(FQDN) > 64 This is a fix for KUDU-1981: with security enabled, Kudu servers cannot start at machines with len(FQDN) > 64. Prior to this fix, the host FQDN was put into the CSR's CN (common name) field while generating self-signed certificate for server RPC messenger. Per RFC5280, the CN field cannot contain strings longer than 64 characters long, and it seems OpenSSL enforces that limit as required. The idea is to put FQDNs into the SAN X509v3 extension field as 'DNS' fields. That makes it possible to have names in the SAN which are even longer than 255 characters. This patch returns back a part of the SAN-related functionality which had been implemented initially in cert_management.cc and then removed since it was not used back then. This patch also adds a couple of unit tests to cover the new functionality and to make sure it's possible to set CN field of CSR to 64-chars length value and have corresponding X509 certificate generated with no issues. Change-Id: Ie142e76e9b2dcef3e07dd33d82b6758c746ced19 Reviewed-on: http://gerrit.cloudera.org:8080/6734 Tested-by: Kudu Jenkins Reviewed-by: Dan Burkert(cherry picked from commit eccafbcfbd41324164f7df10219a2b9c3d161269) Reviewed-on: http://gerrit.cloudera.org:8080/6805 Reviewed-by: Jean-Daniel Cryans --- M src/kudu/integration-tests/master_cert_authority-itest.cc M src/kudu/security/ca/cert_management-test.cc M src/kudu/security/ca/cert_management.cc M src/kudu/security/ca/cert_management.h M src/kudu/security/cert-test.cc M src/kudu/security/cert.cc M src/kudu/security/cert.h M src/kudu/security/test/test_certs.cc M src/kudu/security/test/test_certs.h M src/kudu/security/tls_context.cc 10 files changed, 232 insertions(+), 74 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6805 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie142e76e9b2dcef3e07dd33d82b6758c746ced19 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Adar Dembo has posted comments on this change. Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Alexey Serbin has posted comments on this change. Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/6795/5/build-support/lsan-suppressions.txt File build-support/lsan-suppressions.txt: Line 30 > So either one works? Why use a suppression then? At least for TSAN, suppres It appeared to me that adding a suppression would be a cleaner way, not touching the source code. But since it seems to be temporary workaround, I think it does not matter that much. All right, let me put it back into the code (given that for TSAN supressions are more expensive that scoped leak disablers). -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6795 to look at the new patch set (#7). Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. KUDU-1993: fixed validation of 'grouped' gflags Added generic implementation for grouped gflags validators. Use CUSTOM_FLAG_VALIDATOR() macro to register late-phase validator function for command-line flags. The validation is performed upon call of HandleCommonFlags() invoked by the top-level ParseCommandLineFlags() function. Added unit test to cover the new functionality. Updated validators for security-related RPC and embedded webserver flags. As a workaround for LSAN reports on the leaks in case of ASAN build, added two scoped leak check disablers into CreateAndStartTimeoutThread. That does not seem harmful or hiding any potential leaks since it affects only the timeout thread itself. Opened separate JIRA to track the (false positive?) leak warning: KUDU-1995. Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 --- M src/kudu/rpc/messenger.cc M src/kudu/server/webserver_options.cc M src/kudu/util/CMakeLists.txt A src/kudu/util/flag_validators-test.cc A src/kudu/util/flag_validators.cc A src/kudu/util/flag_validators.h M src/kudu/util/flags.cc M src/kudu/util/test_main.cc 8 files changed, 415 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/6795/7 -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-1.3.x) KUDU-1981 Kudu should run at hosts len(FQDN) > 64
Jean-Daniel Cryans has posted comments on this change. Change subject: KUDU-1981 Kudu should run at hosts len(FQDN) > 64 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6805 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie142e76e9b2dcef3e07dd33d82b6758c746ced19 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Adar Dembo has posted comments on this change. Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/6795/5/build-support/lsan-suppressions.txt File build-support/lsan-suppressions.txt: PS5, Line 26: Adding this as a workaround since : # right now it's not clear what going on exactly > https://issues.apache.org/jira/browse/KUDU-1995 Could you add the link to lsan-suppressions.txt though? Line 30: leak:CreateAndStartTimeoutThread > It works. Updated. So either one works? Why use a suppression then? At least for TSAN, suppressions are more expensive to run than scoped disablers. -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](branch-1.3.x) KUDU-1981 Kudu should run at hosts len(FQDN) > 64
Hello Dan Burkert, Kudu Jenkins, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/6805 to review the following change. Change subject: KUDU-1981 Kudu should run at hosts len(FQDN) > 64 .. KUDU-1981 Kudu should run at hosts len(FQDN) > 64 This is a fix for KUDU-1981: with security enabled, Kudu servers cannot start at machines with len(FQDN) > 64. Prior to this fix, the host FQDN was put into the CSR's CN (common name) field while generating self-signed certificate for server RPC messenger. Per RFC5280, the CN field cannot contain strings longer than 64 characters long, and it seems OpenSSL enforces that limit as required. The idea is to put FQDNs into the SAN X509v3 extension field as 'DNS' fields. That makes it possible to have names in the SAN which are even longer than 255 characters. This patch returns back a part of the SAN-related functionality which had been implemented initially in cert_management.cc and then removed since it was not used back then. This patch also adds a couple of unit tests to cover the new functionality and to make sure it's possible to set CN field of CSR to 64-chars length value and have corresponding X509 certificate generated with no issues. Change-Id: Ie142e76e9b2dcef3e07dd33d82b6758c746ced19 Reviewed-on: http://gerrit.cloudera.org:8080/6734 Tested-by: Kudu Jenkins Reviewed-by: Dan Burkert(cherry picked from commit eccafbcfbd41324164f7df10219a2b9c3d161269) --- M src/kudu/integration-tests/master_cert_authority-itest.cc M src/kudu/security/ca/cert_management-test.cc M src/kudu/security/ca/cert_management.cc M src/kudu/security/ca/cert_management.h M src/kudu/security/cert-test.cc M src/kudu/security/cert.cc M src/kudu/security/cert.h M src/kudu/security/test/test_certs.cc M src/kudu/security/test/test_certs.h M src/kudu/security/tls_context.cc 10 files changed, 232 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/6805/1 -- To view, visit http://gerrit.cloudera.org:8080/6805 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie142e76e9b2dcef3e07dd33d82b6758c746ced19 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6795 to look at the new patch set (#6). Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. KUDU-1993: fixed validation of 'grouped' gflags Added generic implementation for grouped gflags validators. Use CUSTOM_FLAG_VALIDATOR() macro to register late-phase validator function for command-line flags. The validation is performed upon call of HandleCommonFlags() invoked by the top-level ParseCommandLineFlags() function. Added unit test to cover the new functionality. Updated validators for security-related RPC and embedded webserver flags. As a workaround for LSAN reports on the leaks in case of ASAN build, added suppression for CreateAndStartTimeoutThread(). That does not seem harmful or hiding any potential leaks since it affects only the timeout thread itself. Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 --- M build-support/lsan-suppressions.txt M src/kudu/rpc/messenger.cc M src/kudu/server/webserver_options.cc M src/kudu/util/CMakeLists.txt A src/kudu/util/flag_validators-test.cc A src/kudu/util/flag_validators.cc A src/kudu/util/flag_validators.h M src/kudu/util/flags.cc M src/kudu/util/test_main.cc 9 files changed, 411 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/6795/6 -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Alexey Serbin has posted comments on this change. Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. Patch Set 5: (12 comments) http://gerrit.cloudera.org:8080/#/c/6795/5/build-support/lsan-suppressions.txt File build-support/lsan-suppressions.txt: PS5, Line 26: Adding this as a workaround since : # right now it's not clear what going on exactly > If this warrants further investigation, could you file a JIRA and link to i https://issues.apache.org/jira/browse/KUDU-1995 PS5, Line 28: tests > Got 'tests' twice in here. Done Line 30: leak:CreateAndStartTimeoutThread() > Did the ScopedLeakCheckDisablers not work? It works. Updated. http://gerrit.cloudera.org:8080/#/c/6795/5/src/kudu/rpc/messenger.cc File src/kudu/rpc/messenger.cc: Line 157: Status s = ParseTriState("--rpc_authentication", FLAGS_rpc_authentication, ); > Can we CHECK_OK() here and below? The group validators are assumed to run a Correct. We can put CHECK_OK http://gerrit.cloudera.org:8080/#/c/6795/5/src/kudu/util/flag_validators.cc File src/kudu/util/flag_validators.cc: PS5, Line 30: custom > group Done Line 37: void Register(const string& name, FlagValidator func) { > warning: the parameter 'func' is copied for each invocation but only used a Done http://gerrit.cloudera.org:8080/#/c/6795/5/src/kudu/util/flag_validators.h File src/kudu/util/flag_validators.h: PS5, Line 42: starndard > standard Done PS5, Line 55: indivigual > individual Done PS5, Line 58: this > Nit: 'the' (to be consistent with "Use the DEFINE_validator() macro...") Done Line 75: // GROUP_FLAG_VALIDATOR(groped_flags_validator, ValidateGroupedFlags); > grouped_flags_validator Done. With std::function it's not necessary to take the address, AFAIK. PS5, Line 88: an > a Done PS5, Line 88: validator > validators (or "registers a group validator") Done -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1970: node density integration test
Hello Dan Burkert, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6662 to look at the new patch set (#11). Change subject: KUDU-1970: node density integration test .. KUDU-1970: node density integration test This patch introduces a new itest that simulates a storage-dense Kudu deployment. The idea is simple: rather than actually generating and storing lots of data (which is both time intensive and developer unfriendly), let's run a workload that produces a lot of metadata with a minimal amount of data. This is cheaper, and the metadata can proxy for data in areas we care about (such as start up time, thread count, memory usage, etc.). The test itself isn't that interesting; most of the challenge was in running it repeatedly to determine which flag values yielded the most metadata. In a run of the test on a 48 core el6.6 machine (max_blocks_per_container=8, num_tablets=1000, num_seconds=240), I produced ~110K blocks across ~21k LBM containers, which yielded a subsequent ~100s LBM startup time. I made the following modifications elsewhere to make this work: - TestWorkload now supports arbitrary schemas. - EMC-based tests can configure the amount of time they wait on each daemon process to start as the server info file isn't dumped until after FS startup is complete (maybe that should be changed?) - The benchmarks.sh script runs the test with some customized parameters. I also snuck in changes to remove an unused variable from random.h and to switch TestWorkload from kudu::Thread to std::thread. Change-Id: Ie9b5d01557eb41d386ce92f576ed01ec658e8e7d --- M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/dense_node-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.h M src/kudu/integration-tests/external_mini_cluster.cc M src/kudu/integration-tests/external_mini_cluster.h M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/scripts/benchmarks.sh M src/kudu/tools/data_gen_util.cc M src/kudu/tools/data_gen_util.h M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/util/random.h 13 files changed, 355 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/6662/11 -- To view, visit http://gerrit.cloudera.org:8080/6662 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9b5d01557eb41d386ce92f576ed01ec658e8e7d Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] cfile set: avoid Status allocation for row not found
Todd Lipcon has posted comments on this change. Change subject: cfile_set: avoid Status allocation for row not found .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6804/2/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: PS2, Line 238: s.IsNotFound() > If this is also a common case, it should be converted in the same way. not that common, since the bloom filter above already cuts out 99.99% of cases or somesuch http://gerrit.cloudera.org:8080/#/c/6804/2/src/kudu/tablet/diskrowset.cc File src/kudu/tablet/diskrowset.cc: Line 615: Status DiskRowSet::MutateRow(Timestamp timestamp, > Likewise with the various NotFound() exits from here. yea, that's a little trickier though, since we don't have a pre-existing out-param handy. -- To view, visit http://gerrit.cloudera.org:8080/6804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3056aaaf0eec6b7deebcd1f206cb551b75996927 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1970: node density integration test
Adar Dembo has posted comments on this change. Change subject: KUDU-1970: node density integration test .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/6662/9/src/kudu/integration-tests/dense_node-itest.cc File src/kudu/integration-tests/dense_node-itest.cc: PS9, Line 71: > right, most are benchmarks among other tests or actual tests that test for I don't think a new convention is necessary, and FWIW I see enough overlap with existing tests called from benchmarks.sh. tablet_server-stress-test and full_stack-insert-scan-test, to name two. http://gerrit.cloudera.org:8080/#/c/6662/10/src/kudu/integration-tests/test_workload.cc File src/kudu/integration-tests/test_workload.cc: PS10, Line 233: // Do some sanity checks on the schema. They reflect how the rest of : // TestWorkload is going to use the schema. : CHECK_GT(schema_.num_columns(), 0) << "Schema should have at least one column"; : vector key_indexes; : schema_.GetPrimaryKeyColumnIndexes(_indexes); : CHECK_EQ(1, key_indexes.size()) << "Schema should have just one key column"; : CHECK_EQ(0, key_indexes[0]) << "Schema's key column should be index 0"; : KuduColumnSchema key = schema_.Column(0); : CHECK_EQ("key", key.name()) << "Schema column should be named 'key'"; : CHECK_EQ(KuduColumnSchema::INT32, key.type()) << "Schema key column should be of type INT32"; > why not do this when the schema is set? we know that the simple schema abid Done -- To view, visit http://gerrit.cloudera.org:8080/6662 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9b5d01557eb41d386ce92f576ed01ec658e8e7d Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] cfile set: avoid Status allocation for row not found
Adar Dembo has posted comments on this change. Change subject: cfile_set: avoid Status allocation for row not found .. Patch Set 2: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6804/2/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: PS2, Line 238: s.IsNotFound() If this is also a common case, it should be converted in the same way. http://gerrit.cloudera.org:8080/#/c/6804/2/src/kudu/tablet/diskrowset.cc File src/kudu/tablet/diskrowset.cc: Line 615: Status DiskRowSet::MutateRow(Timestamp timestamp, Likewise with the various NotFound() exits from here. -- To view, visit http://gerrit.cloudera.org:8080/6804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3056aaaf0eec6b7deebcd1f206cb551b75996927 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1949. Maintenance Manager should trigger flushes earlier
Adar Dembo has posted comments on this change. Change subject: KUDU-1949. Maintenance Manager should trigger flushes earlier .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8b7e703c82ac14fbce3a699bdf6a2f0fb4ed93a1 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Adar Dembo has posted comments on this change. Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. Patch Set 5: (11 comments) http://gerrit.cloudera.org:8080/#/c/6795/5/build-support/lsan-suppressions.txt File build-support/lsan-suppressions.txt: PS5, Line 26: Adding this as a workaround since : # right now it's not clear what going on exactly If this warrants further investigation, could you file a JIRA and link to it here? PS5, Line 28: tests Got 'tests' twice in here. Line 30: leak:CreateAndStartTimeoutThread() Did the ScopedLeakCheckDisablers not work? http://gerrit.cloudera.org:8080/#/c/6795/5/src/kudu/rpc/messenger.cc File src/kudu/rpc/messenger.cc: Line 157: Status s = ParseTriState("--rpc_authentication", FLAGS_rpc_authentication, ); Can we CHECK_OK() here and below? The group validators are assumed to run after the regular gflag validators, and so if we are running this validator it should be safe to assume that --rpc_authentication and --rpc_encryption are valid, right? http://gerrit.cloudera.org:8080/#/c/6795/5/src/kudu/util/flag_validators.cc File src/kudu/util/flag_validators.cc: PS5, Line 30: custom group http://gerrit.cloudera.org:8080/#/c/6795/5/src/kudu/util/flag_validators.h File src/kudu/util/flag_validators.h: PS5, Line 42: starndard standard PS5, Line 55: indivigual individual PS5, Line 58: this Nit: 'the' (to be consistent with "Use the DEFINE_validator() macro...") Line 75: // GROUP_FLAG_VALIDATOR(groped_flags_validator, ValidateGroupedFlags); grouped_flags_validator Also, do you need to take the address of ValidateGroupedFlags (i.e. "")? PS5, Line 88: an a PS5, Line 88: validator validators (or "registers a group validator") -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1927 [master] always send IPKI CA certificate
Todd Lipcon has posted comments on this change. Change subject: KUDU-1927 [master] always send IPKI CA certificate .. Patch Set 6: Looking back over old JIRAs. This is still relevant, right Alexey? Just waiting on review? -- To view, visit http://gerrit.cloudera.org:8080/6356 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a20e6da6531091fcc597af58a7bbf01a8847f5 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Fix bug in incorrect response rebuilding on tablet bootstrap
Todd Lipcon has posted comments on this change. Change subject: Fix bug in incorrect response rebuilding on tablet bootstrap .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/5489/5/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: PS5, Line 264: const TxResultPB _result, : TxResultPB *new_result, : WriteResponsePB *response, : bool *all_skipped); nit: the *s and should go with the type, not the var PS5, Line 1259: const TxResultPB _result, : TxResultPB *new_result, : WriteResponsePB *response, : bool *all_skipped) style: same as above -- To view, visit http://gerrit.cloudera.org:8080/5489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1219ed5f7835e93cd7f3b128cedd650fc3384482 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] cfile set: avoid Status allocation for row not found
Hello David Ribeiro Alves, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6804 to look at the new patch set (#2). Change subject: cfile_set: avoid Status allocation for row not found .. cfile_set: avoid Status allocation for row not found We expect Bloom Filter lookups to typically return "row not found". The code currently indicates this using a Status::NotFound(). However, constructing such a status involves an alloc/free pair, which is relatively expensive to do millions of times per second for this common case. In a profile of tcph_real_world, the tserver spends a good chunk of its CPU in these code paths, especially now that our alloc/free contain heavier instrumentation for memory accounting. This changes to using a boost::optional out-parameter for the common case instead. Benchmarked with tpch_real_world SF300 as in change ID I8b7e703c82ac14fbce3a699bdf6a2f0fb4ed93a1: Before After Mem rejections 62,790 75,318 (1.20x more) Wall time 2490s2185s(1.14x faster) Server CPU 72,873s 61,879s (1.18x faster) The increased mem rejections makes sense, since we're now able to insert a bit faster while maintenance operations stayed the same speed. The CPU improvements are due to this optimization, and the wall improvement is a result of the CPU improvement. Change-Id: I3056aaaf0eec6b7deebcd1f206cb551b75996927 --- M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h M src/kudu/tablet/diskrowset.cc 3 files changed, 33 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/6804/2 -- To view, visit http://gerrit.cloudera.org:8080/6804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3056aaaf0eec6b7deebcd1f206cb551b75996927 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: cfile set: avoid Status allocation for row not found
Todd Lipcon has posted comments on this change. Change subject: WIP: cfile_set: avoid Status allocation for row not found .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6804/1/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: Line 221: } else if (!s.ok()) { > warning: don't use else after return [readability-else-after-return] Done http://gerrit.cloudera.org:8080/#/c/6804/1/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: Line 111: Status NewKeyIterator(CFileIterator **iter) const; > warning: function 'kudu::tablet::CFileSet::NewKeyIterator' has a definition Done -- To view, visit http://gerrit.cloudera.org:8080/6804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3056aaaf0eec6b7deebcd1f206cb551b75996927 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1949. Maintenance Manager should trigger flushes earlier
Hello David Ribeiro Alves, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6802 to look at the new patch set (#2). Change subject: KUDU-1949. Maintenance Manager should trigger flushes earlier .. KUDU-1949. Maintenance Manager should trigger flushes earlier This changes the maintenance manager to start triggering flushes at the 60% heap usage threshold, and adjusts the memory limiting to only start rejecting writes at the 80% threshold. This is based on analysis in [1] ("After tweaking memory backpressure rejection") in which we found that earlier flushing could keep the system from making too many rejections. I ran tpch_real_world with SF=300 as described in the doc below before and after the patch. Some key metrics: Before After Mem rejections 1,162,52762,790(18.5x fewer) Wall time 3156s2490s (1.26x faster) Server CPU 37,348s 72,873s (1.95x less efficient) This matches the results from the original experiment. Because the writes come in faster, we don't get enough time to run compactions, and so each write ends up taking more CPU. Some future optimizations are still pending which will reduce the CPU time per bloom lookup. Regardless, this new behavior is configurable such that, if overall CPU efficiency is more important than wall time, we could bump the "memory pressure threshold" up higher than the "memory rejection threshold". [1] https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit#heading=h.gw87heq4fq6l Change-Id: I8b7e703c82ac14fbce3a699bdf6a2f0fb4ed93a1 --- M src/kudu/util/maintenance_manager.cc M src/kudu/util/process_memory.cc M src/kudu/util/process_memory.h 3 files changed, 31 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/6802/2 -- To view, visit http://gerrit.cloudera.org:8080/6802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8b7e703c82ac14fbce3a699bdf6a2f0fb4ed93a1 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: cfile set: avoid Status allocation for row not found
Hello David Ribeiro Alves, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/6804 to review the following change. Change subject: WIP: cfile_set: avoid Status allocation for row not found .. WIP: cfile_set: avoid Status allocation for row not found We expect Bloom Filter lookups to typically return "row not found". The code currently indicates this using a Status::NotFound(). However, constructing such a status involves an alloc/free pair, which is relatively expensive to do millions of times per second for this common case. In a profile of tcph_real_world, the tserver spends a good chunk of its CPU in these code paths, especially now that our alloc/free contain heavier instrumentation for memory accounting. This changes to using a boost::optional out-parameter for the common case instead. WIP: need to add benchmark results Change-Id: I3056aaaf0eec6b7deebcd1f206cb551b75996927 --- M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h M src/kudu/tablet/diskrowset.cc 3 files changed, 30 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/6804/1 -- To view, visit http://gerrit.cloudera.org:8080/6804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3056aaaf0eec6b7deebcd1f206cb551b75996927 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves
[kudu-CR] WIP: KUDU-1949. Maintenance Manager should trigger flushes earlier
Todd Lipcon has posted comments on this change. Change subject: WIP: KUDU-1949. Maintenance Manager should trigger flushes earlier .. Patch Set 1: > If you're already going to run some benchmarks, could you roll into this > change the fix to flush_threshold_mb calculation? hm, would rather not confuse the two changes, but I will do that next (already had kicked off the benchmark before seeing this) -- To view, visit http://gerrit.cloudera.org:8080/6802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8b7e703c82ac14fbce3a699bdf6a2f0fb4ed93a1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1843. Client UUIDs should be cryptographically random
Dan Burkert has posted comments on this change. Change subject: KUDU-1843. Client UUIDs should be cryptographically random .. Patch Set 2: Code-Review-1 I still think this is a bad idea. At the very least, this patch needs to go through every place a client ID is currently logged and redact it. If the client ID is exposed in client API, it needs to be documented that is should be considered a secret. We need to be systematically more careful with client IDs if we treat them as secrets. -- To view, visit http://gerrit.cloudera.org:8080/6347 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0ea6868773cf046944f70c32c647184e4b48c772 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [kudu-jepsen] added more info on troubleshooting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6774 to look at the new patch set (#5). Change subject: [kudu-jepsen] added more info on troubleshooting .. [kudu-jepsen] added more info on troubleshooting Added more information on distinguishing 'errors' from 'failures' in the kudu-jepsen test output. Change-Id: I9b97b744d969b73ede2fcb7a3509915b130c655b --- M java/kudu-jepsen/README.adoc 1 file changed, 30 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/6774/5 -- To view, visit http://gerrit.cloudera.org:8080/6774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b97b744d969b73ede2fcb7a3509915b130c655b Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [kudu-jepsen] added more info on troubleshooting
Alexey Serbin has posted comments on this change. Change subject: [kudu-jepsen] added more info on troubleshooting .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6774/4/java/kudu-jepsen/README.adoc File java/kudu-jepsen/README.adoc: PS4, Line 124: , > or Done -- To view, visit http://gerrit.cloudera.org:8080/6774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b97b744d969b73ede2fcb7a3509915b130c655b Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] Fix bug in incorrect response rebuilding on tablet bootstrap
David Ribeiro Alves has posted comments on this change. Change subject: Fix bug in incorrect response rebuilding on tablet bootstrap .. Patch Set 5: Verified+1 unrelated failure in ClientTest.TestScannerKeepAlive -- To view, visit http://gerrit.cloudera.org:8080/5489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1219ed5f7835e93cd7f3b128cedd650fc3384482 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1843. Client UUIDs should be cryptographically random
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1843. Client UUIDs should be cryptographically random .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6347 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0ea6868773cf046944f70c32c647184e4b48c772 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [catalog manager] categorization of rw operation failures
David Ribeiro Alves has posted comments on this change. Change subject: [catalog_manager] categorization of rw operation failures .. Patch Set 26: (25 comments) http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/integration-tests/catalog_manager_tsk-itest.cc File src/kudu/integration-tests/catalog_manager_tsk-itest.cc: PS26, Line 82: We want this to happen as often as possible, but : // due to truncation issues (double --> int64_t) two TSK keys might be : // generated with interval less than 1 second, which is not desirable. not sure what's the intent of this sentence? do you do something to mitigate the "undesirability"? PS26, Line 88: master_tsk_propagated_by_non_leaders this flag is a bit cryptic, at first I thought this was referring to tablet servers. not sure it's defined in this patch but master_non_leader_masters_propagate_tsk would be clearer, IMO PS26, Line 114: builder : .default_admin_operation_timeout(timeout) no need to wrap, in fact could you do all of this in the decl line? Line 126: // signed by the key which hasn't yet reached the tserver. This can with a key which hasn't yet reached the tserver or non-leader master (or somesuch) PS26, Line 132: IPKI can you TODO yourself? PS26, Line 134: Ideally, (hb_interval_ms_ * 2) would be enough, but due : // to network latency and multi-process OS, the factor should be greater. how about: We allow for extra slop beyond the minimum (hb_interval_ms_ * 2) to account for network latency and thread preemption. PS26, Line 136: SleepFor(MonoDelta::FromMilliseconds(hb_interval_ms_ * 10)); these timeouts always come back to bite us in the form of flakyness. could we sleep the minimum and keep retrying for a while (always making sure that we get the expected error and none other) PS26, Line 137: //std::min(run_time_seconds_ * 1000 / 2, hb_interval_ms_ * 50))); did you mean to leave this? PS26, Line 185: //SleepFor(MonoDelta::FromMilliseconds(hb_interval_ms_)); left over garbage? http://gerrit.cloudera.org:8080/#/c/6170/26/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: PS26, Line 471: const we never declare status const. I know you can here, but I haven't seen that anywhere, so I'd prefer to follow the same style PS26, Line 484: const same PS26, Line 492: LOG(WARNING) << err_msg << "will try next time"; this is not very informative PS26, Line 494: has left where did it go? :) PS26, Line 762: const same PS26, Line 777: leadership change in the middle: : // if the master server loses its leadership role, then there will be an : // error if trying to write into the system table. If that happens, we do : // not activate the newly generated CA information since it is no longer : // relevant. If it was leadership change indeed, the system table should : // contain the CA certificate information when this master is re-elected : // next. numbered bullets or some better way to lay this out. it's hard to parse as it is PS26, Line 794: a PS26, Line 857: LOG(INFO) << "Generated new certificate authority record"; can you provide more info? it'd be good to print a seq no or some identifier PS26, Line 914: using std::bind; using declarations go on the top of the file PS26, Line 919: auto wrapper = [this](std::functionfunc, : const Consensus& consensus, : int64_t start_term, : const char* op_description) { : leader_lock_.AssertAcquiredForWriting(); : const Status s = func(); not really sure what's happening here. can you simplify? PS26, Line 961: static const char* const kLoadMetaOpDescription = : "Loading table and tablet metadata into memory"; can you declare this const on the top of the file? PS26, Line 3231: LOG_WITH_PREFIX(INFO) : << "Latest config for has opid_index of " << latest_index : << " while this task has opid_index of " : << cstate_.config().opid_index() << ". Aborting task."; spurious reformat PS26, Line 3285: LOG_WITH_PREFIX(WARNING) : << "ChangeConfig() failed with leader " : << target_ts_desc_->ToString() : << " due to CAS failure. No further retry: " : << status.ToString(); : MarkFailed(); : break; : default: : LOG_WITH_PREFIX(INFO) : << "ChangeConfig() failed with leader " : <<
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6795 to look at the new patch set (#5). Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. KUDU-1993: fixed validation of 'grouped' gflags Added generic implementation for grouped gflags validators. Use CUSTOM_FLAG_VALIDATOR() macro to register late-phase validator function for command-line flags. The validation is performed upon call of HandleCommonFlags() invoked by the top-level ParseCommandLineFlags() function. Added unit test to cover the new functionality. Updated validators for security-related RPC and embedded webserver flags. As a workaround for LSAN reports on the leaks in case of ASAN build, added suppression for CreateAndStartTimeoutThread(). That does not seem harmful or hiding any potential leaks since it affects only the timeout thread itself. Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 --- M build-support/lsan-suppressions.txt M src/kudu/rpc/messenger.cc M src/kudu/server/webserver_options.cc M src/kudu/util/CMakeLists.txt A src/kudu/util/flag_validators-test.cc A src/kudu/util/flag_validators.cc A src/kudu/util/flag_validators.h M src/kudu/util/flags.cc M src/kudu/util/test_main.cc 9 files changed, 414 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/6795/5 -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Alexey Serbin has posted comments on this change. Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/6795/3//COMMIT_MSG Commit Message: Line 19: This patch addresses the following JIRA item: > Any particular reason your first line isn't just "KUDU-1993: introduced cus Nothing in particular. Probably, that's because of some self-confusion because I thought mostly about custom validators functionality, and also fixed KUDU-1993 in the same patch. http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/rpc/messenger.cc File src/kudu/rpc/messenger.cc: Line 142: DEFINE_validator(rpc_encryption, ); > Could we also add a DEFINE_validator() for rpc_authentication? I know it's That's a good observation. I agree it would be cleaner that way. http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/util/flag_validators-test.cc File src/kudu/util/flag_validators-test.cc: Line 71: class FlagsValidatorsDeathTest : public KuduTest { > Can you explain why ASSERT_DEATH didn't work for this use case? As I understand, ASSERT_DEATH() asserts that the process crashes due to the call of the 'unexpected' handler of the C++ run-time (e.g., it's is called on non-handled exception getting to the very top-level). In this case, the process calls exit() explicitly, so ASSERT_EXIT/EXPECT_EXIT seem more appropriate. https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#how-to-write-a-death-test http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/util/flag_validators.cc File src/kudu/util/flag_validators.cc: PS3, Line 31: check > What does "check" mean in this context? it was meant 'consider using'. I think this comment is mis-placed: the appropriate place for this is in the comment for the GROUP_VALIDATOR_MACRO(). I'll remove this. Line 40: CHECK(validators_.emplace(name, func).second); > Can we use InsertOrDie() here instead? Done http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/util/flag_validators.h File src/kudu/util/flag_validators.h: Line 30: > Perhaps we should refer to these as "group" validators instead of "custom" ok, GROUP_FLAG_VALIDATOR() it will be :) Line 31: #define CUSTOM_FLAG_VALIDATOR(name, func) \ > Add a comment with a sample usage. Done Line 45: class Registrator { > Doc class and constructor briefly. Done http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/util/flags.cc File src/kudu/util/flags.cc: Line 415: if (!e.second()) { > I think we should allow all custom validators to run before exiting. We nee Yep, that's a better approach. http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/util/test_main.cc File src/kudu/util/test_main.cc: Line 45: // The scope leak check disablers are to supress LSAN warnings in death tests. > I'd like to better understand this; can you add something to the commit des This is just a work-around for LSAN leaks. Frankly, I'm not happy about this, but that was the easier way to get it passing the ASAN build if running those death tests. It would be nice to clean this up. I'll add a note into the commit message. -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1993: fixed validation of 'grouped' gflags
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6795 to look at the new patch set (#4). Change subject: KUDU-1993: fixed validation of 'grouped' gflags .. KUDU-1993: fixed validation of 'grouped' gflags Added generic implementation for grouped gflags validators. Use CUSTOM_FLAG_VALIDATOR() macro to register late-phase validator function for command-line flags. The validation is performed upon call of HandleCommonFlags() invoked by the top-level ParseCommandLineFlags() function. Added unit test to cover the new functionality. Updated validators for security-related RPC and embedded webserver flags. As a workaround for LSAN reports on the leaks in case of ASAN build, added ScopedLeakCheckDisabler into CreateAndStartTimeoutThread(). These do not seem to be harmful or hiding any potential leaks since they are scoped and affect only the timeout thread itself. Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 --- M src/kudu/rpc/messenger.cc M src/kudu/server/webserver_options.cc M src/kudu/util/CMakeLists.txt A src/kudu/util/flag_validators-test.cc A src/kudu/util/flag_validators.cc A src/kudu/util/flag_validators.h M src/kudu/util/flags.cc M src/kudu/util/test_main.cc 8 files changed, 413 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/6795/4 -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Rename TabletPeer to TabletReplica
Mike Percy has submitted this change and it was merged. Change subject: Rename TabletPeer to TabletReplica .. Rename TabletPeer to TabletReplica This class was confusingly named. This changes the class names as well as all usages from TabletPeer to TabletReplica. This was initially done with a Perl script, then the indentation was corrected, the variable naming was updated, and the documentation was modified by hand. Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 Reviewed-on: http://gerrit.cloudera.org:8080/6794 Tested-by: Kudu Jenkins Reviewed-by: Will Berkeley--- M docs/design-docs/rpc-retry-and-failover.md M src/kudu/client/client-test.cc M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus.h M src/kudu/consensus/log.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/consistency-itest.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/full_stack-insert-scan-test.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/tablet_history_gc-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/sys_catalog.cc M src/kudu/master/sys_catalog.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/local_tablet_writer.h M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc R src/kudu/tablet/tablet_replica-test.cc R src/kudu/tablet/tablet_replica.cc R src/kudu/tablet/tablet_replica.h R src/kudu/tablet/tablet_replica_mm_ops.cc R src/kudu/tablet/tablet_replica_mm_ops.h M src/kudu/tablet/transactions/alter_schema_transaction.cc M src/kudu/tablet/transactions/alter_schema_transaction.h M src/kudu/tablet/transactions/transaction.cc M src/kudu/tablet/transactions/transaction.h M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_tracker.cc M src/kudu/tablet/transactions/transaction_tracker.h M src/kudu/tablet/transactions/write_transaction.cc M src/kudu/tablet/transactions/write_transaction.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/scanners-test.cc M src/kudu/tserver/scanners.cc M src/kudu/tserver/scanners.h M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_service.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h R src/kudu/tserver/tablet_replica_lookup.h M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tablet_service.h M src/kudu/tserver/ts_tablet_manager-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc 61 files changed, 930 insertions(+), 927 deletions(-) Approvals: Will Berkeley: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] Rename TabletPeer to TabletReplica
Will Berkeley has posted comments on this change. Change subject: Rename TabletPeer to TabletReplica .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] tablet: revert batch-check-presence optimization for UPDATE/DELETE
David Ribeiro Alves has posted comments on this change. Change subject: tablet: revert batch-check-presence optimization for UPDATE/DELETE .. Patch Set 1: (1 comment) still want this reviewed? (had an old unposted comment that I'll post now too) http://gerrit.cloudera.org:8080/#/c/6577/1//COMMIT_MSG Commit Message: PS1, Line 37: for each op: : if we noted a rowset above, apply there : check bloom : find the row index : otherwise check MRS this will still regress UPSERTS that transform to MUTATES right? (I guess we should monitor perf for those better) -- To view, visit http://gerrit.cloudera.org:8080/6577 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I78e1e0a0293d737074e1309d2ef450c5439db2c6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] [kudu-jepsen] added more info on troubleshooting
David Ribeiro Alves has posted comments on this change. Change subject: [kudu-jepsen] added more info on troubleshooting .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6774/4/java/kudu-jepsen/README.adoc File java/kudu-jepsen/README.adoc: PS4, Line 124: , or -- To view, visit http://gerrit.cloudera.org:8080/6774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b97b744d969b73ede2fcb7a3509915b130c655b Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] WIP KUDU-1826 add propagated timestamp to sync client
David Ribeiro Alves has posted comments on this change. Change subject: WIP KUDU-1826 add propagated timestamp to sync client .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6798/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java: PS1, Line 62: public long getLastPropagatedTimestamp() { docs -- To view, visit http://gerrit.cloudera.org:8080/6798 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbfeb9526423f5acd0ef6482869d36cdd146e921 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-1949. Maintenance Manager should trigger flushes earlier
David Ribeiro Alves has posted comments on this change. Change subject: WIP: KUDU-1949. Maintenance Manager should trigger flushes earlier .. Patch Set 1: looks good, waiting for those results -- To view, visit http://gerrit.cloudera.org:8080/6802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8b7e703c82ac14fbce3a699bdf6a2f0fb4ed93a1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6636 to look at the new patch set (#17). Change subject: KUDU-1952 Remove round-robin for block placement .. KUDU-1952 Remove round-robin for block placement This is the first of a multi-patch patchset to mitigate the single-disk failure. Throughout the code, the term "DataDir" refers to a data directory, which is often mounted on a distinct disk. Thus, "disks" and "data directories" will be used interchangeably. This patch adds a mapping from tablet to a set of disks and uses it to replace the existing round-robin placement of blocks. Tablets are mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks are placed randomly in directories within each tablet's DataDirGroup. Tablet-to-group mappings are generated and stored as metadata upon tablet creation, or upon tablet replacement during a tablet copy. During group creation, disks are added to groups by randomly selecting two available directories and selecting the one with fewer tablets on it. This avoids pigeonholing new tablets to disks with relatively few tablets, while still trending towards filling underloaded disks. Groups are maintained when restarting the server, as they are flushed with metadata, and are deleted upon tablet deletion. When loading tablet data from a previous version of Kudu, the tablet's superblock will not have a DataDirGroup. One will be generated containing all data directories, as the tablet's data may already be spread across any number of disks. As this patch only addresses block placement, it does not itself mitigate single-disk failure. Given this, and given the tradeoff between I/O and failure disk-failure tolerance, the default behavior will be to spread tablet data across all disks. Testing is done at the block manager level in block_manager-test and log_block_manager-test, as well as in the new data_dirs-test. A design doc can be found here: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b --- M src/kudu/cfile/bloomfile-test-base.h M src/kudu/cfile/cfile-test-base.h M src/kudu/cfile/cfile-test.cc M src/kudu/fs/CMakeLists.txt M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h A src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/delta_compaction-test.cc M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 37 files changed, 1,002 insertions(+), 175 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/17 -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Fix bug in incorrect response rebuilding on tablet bootstrap
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5489 to look at the new patch set (#5). Change subject: Fix bug in incorrect response rebuilding on tablet bootstrap .. Fix bug in incorrect response rebuilding on tablet bootstrap This fixes the bug described in: https://gerrit.cloudera.org/#/c/5417/ ... and enables the test disabled in that patch. Along the way this also performs some cleanup of tablet bootstrap. I ran raft_consensus-itest's TestInsertDuplicateKeysWithCrashyNodes on dist-test for 5000 loops with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 5000 -- bin/raft_consensus-itest \ --gtest_filter=*DuplicateKeysWithCrashyNodes Prior to this patch the test failed 62/5000: http://dist-test.cloudera.org//job?job_id=david.alves.1493915326.763 After this patch the test passes 5000/5000: http://dist-test.cloudera.org//job?job_id=david.alves.1493914745.27867 Change-Id: I1219ed5f7835e93cd7f3b128cedd650fc3384482 --- M src/kudu/tablet/row_op.cc M src/kudu/tablet/row_op.h M src/kudu/tablet/tablet.proto M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tserver/tablet_server-test.cc 5 files changed, 122 insertions(+), 114 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/5489/5 -- To view, visit http://gerrit.cloudera.org:8080/5489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1219ed5f7835e93cd7f3b128cedd650fc3384482 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Fix bug in incorrect response rebuilding on tablet bootstrap
David Ribeiro Alves has posted comments on this change. Change subject: Fix bug in incorrect response rebuilding on tablet bootstrap .. Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/5489/4/src/kudu/tablet/row_op.cc File src/kudu/tablet/row_op.cc: Line 56: DCHECK(!this->result) << this->result->DebugString(); > think we should leave this with SecureDebugString yeah this patch predates the secure stuff. Done http://gerrit.cloudera.org:8080/#/c/5489/4/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: Line 251: // Plays operations, skipping those that have already been flushed. > can you clarify how it knows what is already flushed here? Not sure what you mean. I added a comment forwarding the reader to ApplyOperations, where that decision is actually made. PS4, Line 256: result > update Done Line 259: TxResultPB* new_result, > update docs to explain what it's filling in in new_result Done PS4, Line 1440: ShortDebugString > should use Secure form Done PS4, Line 1459: const OperationResultPB& new_op_result = new_result->ops(curr_op_idx); > this is only used down below inside the if block on L1491, right? maybe mov Done PS4, Line 1491: flushed > should we rename this PB field to 'skip_on_replay'? I think it would be mor Done PS4, Line 1519: already_flushed > also should maybe rename this field to "skip_replay"? Done -- To view, visit http://gerrit.cloudera.org:8080/5489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1219ed5f7835e93cd7f3b128cedd650fc3384482 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Rename TabletPeer to TabletReplica
Hello Will Berkeley, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6794 to look at the new patch set (#3). Change subject: Rename TabletPeer to TabletReplica .. Rename TabletPeer to TabletReplica This class was confusingly named. This changes the class names as well as all usages from TabletPeer to TabletReplica. This was initially done with a Perl script, then the indentation was corrected, the variable naming was updated, and the documentation was modified by hand. Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 --- M docs/design-docs/rpc-retry-and-failover.md M src/kudu/client/client-test.cc M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus.h M src/kudu/consensus/log.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/consistency-itest.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/full_stack-insert-scan-test.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/tablet_history_gc-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/sys_catalog.cc M src/kudu/master/sys_catalog.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/local_tablet_writer.h M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc R src/kudu/tablet/tablet_replica-test.cc R src/kudu/tablet/tablet_replica.cc R src/kudu/tablet/tablet_replica.h R src/kudu/tablet/tablet_replica_mm_ops.cc R src/kudu/tablet/tablet_replica_mm_ops.h M src/kudu/tablet/transactions/alter_schema_transaction.cc M src/kudu/tablet/transactions/alter_schema_transaction.h M src/kudu/tablet/transactions/transaction.cc M src/kudu/tablet/transactions/transaction.h M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_tracker.cc M src/kudu/tablet/transactions/transaction_tracker.h M src/kudu/tablet/transactions/write_transaction.cc M src/kudu/tablet/transactions/write_transaction.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/scanners-test.cc M src/kudu/tserver/scanners.cc M src/kudu/tserver/scanners.h M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_service.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h R src/kudu/tserver/tablet_replica_lookup.h M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tablet_service.h M src/kudu/tserver/ts_tablet_manager-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc 61 files changed, 930 insertions(+), 927 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/6794/3 -- To view, visit http://gerrit.cloudera.org:8080/6794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6630 to look at the new patch set (#14). Change subject: WIP: KUDU-463. Add checksumming to cfile .. WIP: KUDU-463. Add checksumming to cfile Adds optional checksumming and validation to cfile blocks. Introduces 2 flags to control cfile checksumming: - cfile_write_checksums (default false) - cfile_verify_checksums (default true) cfile_write_checksums is used in the CFileWriter to enable computing and appending Crc32 checksums to the end of each cfile block, header and footer cfile_write_checksums is defaulted to false to ensure upgrades don't immediately result in performance degredation and incompatible data on downgrade. It can and likely should be defaulted to true in a later release. When cfile_write_checksums is set to true, the existing incompatible_features bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear error message when older versions of Kudu try to read the file. If checksums are not written the incompatible_features "checksum" bit is not set. cfile_verify_checksums is used in the CFileReader to enable validating the data against the written checksum. cfile_verify_checksums is defaulted to true since validation only occurs if checksums are written. Any data that was written before checksumming was an option or when cfile_write_checksums was false will not be verified. Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 --- M docs/design-docs/cfile.md M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/util/crc-test.cc M src/kudu/util/crc.cc M src/kudu/util/crc.h 9 files changed, 324 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/14 -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Rename TabletPeer to TabletReplica
Will Berkeley has posted comments on this change. Change subject: Rename TabletPeer to TabletReplica .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6636 to look at the new patch set (#16). Change subject: KUDU-1952 Remove round-robin for block placement .. KUDU-1952 Remove round-robin for block placement This is the first of a multi-patch patchset to mitigate the single-disk failure. Throughout the code, the term "DataDir" refers to a data directory, which is often mounted on a distinct disk. Thus, "disks" and "data directories" will be used interchangeably. This patch adds a mapping from tablet to a set of disks and uses it to replace the existing round-robin placement of blocks. Tablets are mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks are placed randomly in directories within each tablet's DataDirGroup. Tablet-to-group mappings are generated and stored as metadata upon tablet creation, or upon tablet replacement during a tablet copy. During group creation, disks are added to groups by randomly selecting two available directories and selecting the one with fewer tablets on it. This avoids pigeonholing new tablets to disks with relatively few tablets, while still trending towards filling underloaded disks. Groups are maintained when restarting the server, as they are flushed with metadata, and are deleted upon tablet deletion. When loading tablet data from a previous version of Kudu, the tablet's superblock will not have a DataDirGroup. One will be generated containing all data directories, as the tablet's data may already be spread across any number of disks. As this patch only addresses block placement, it does not itself mitigate single-disk failure. Given this, and given the tradeoff between I/O and failure disk-failure tolerance, the default behavior will be to spread tablet data across all disks. Testing is done at the block manager level in block_manager-test and log_block_manager-test, as well as in the new data_dirs-test. A design doc can be found here: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b --- M src/kudu/cfile/bloomfile-test-base.h M src/kudu/cfile/cfile-test-base.h M src/kudu/cfile/cfile-test.cc M src/kudu/fs/CMakeLists.txt M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h A src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/delta_compaction-test.cc M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 37 files changed, 1,023 insertions(+), 175 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/16 -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 16 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Rename TabletPeer to TabletReplica
Hello Will Berkeley, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6794 to look at the new patch set (#2). Change subject: Rename TabletPeer to TabletReplica .. Rename TabletPeer to TabletReplica This class was confusingly named. This changes the class names as well as all usages from TabletPeer to TabletReplica. This was initially done with a Perl script, then the indentation was corrected, the variable naming was updated, and the documentation was modified by hand. Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 --- M docs/design-docs/rpc-retry-and-failover.md M src/kudu/client/client-test.cc M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus.h M src/kudu/consensus/log.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/consistency-itest.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/full_stack-insert-scan-test.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/tablet_history_gc-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/sys_catalog.cc M src/kudu/master/sys_catalog.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/local_tablet_writer.h M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc R src/kudu/tablet/tablet_replica-test.cc R src/kudu/tablet/tablet_replica.cc R src/kudu/tablet/tablet_replica.h R src/kudu/tablet/tablet_replica_mm_ops.cc R src/kudu/tablet/tablet_replica_mm_ops.h M src/kudu/tablet/transactions/alter_schema_transaction.cc M src/kudu/tablet/transactions/alter_schema_transaction.h M src/kudu/tablet/transactions/transaction.cc M src/kudu/tablet/transactions/transaction.h M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_tracker.cc M src/kudu/tablet/transactions/transaction_tracker.h M src/kudu/tablet/transactions/write_transaction.cc M src/kudu/tablet/transactions/write_transaction.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/scanners-test.cc M src/kudu/tserver/scanners.cc M src/kudu/tserver/scanners.h M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_service.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h R src/kudu/tserver/tablet_replica_lookup.h M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tablet_service.h M src/kudu/tserver/ts_tablet_manager-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc 61 files changed, 975 insertions(+), 923 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/6794/2 -- To view, visit http://gerrit.cloudera.org:8080/6794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] Rename TabletPeer to TabletReplica
Mike Percy has posted comments on this change. Change subject: Rename TabletPeer to TabletReplica .. Patch Set 1: Yeah these kinds of patches have a short shelf life! Rebasing manually. -- To view, visit http://gerrit.cloudera.org:8080/6794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] WIP: KUDU-463. Add checksumming to cfile
Grant Henke has posted comments on this change. Change subject: WIP: KUDU-463. Add checksumming to cfile .. Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 67: TAG_FLAG(cfile_write_checksums, experimental); > No, my reading skills aren't as good as they should be. Sorry about that. Done Line 186: RETURN_NOT_OK_PREPEND(WriteRawData(Slice(checksum_buf, sizeof(checksum_buf))), > Rather than two separate calls to WriteRawData(), could we append the check Done Line 494: RETURN_NOT_OK(WriteRawData(data)); > As I was reading through the comments I started getting a similar feeling t Done Line 504: RETURN_NOT_OK(WriteRawData(Slice(crc_buf, sizeof(crc_buf; > Let's also combine the writes here. It's a little trickier since we need to Done -- To view, visit http://gerrit.cloudera.org:8080/6630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [util] introduced custom gflags validators
Dan Burkert has posted comments on this change. Change subject: [util] introduced custom gflags validators .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/rpc/messenger.cc File src/kudu/rpc/messenger.cc: Line 142: DEFINE_validator(rpc_encryption, ); > Could we also add a DEFINE_validator() for rpc_authentication? I know it's The two validators could actually be implemented as a single function, e.g. 'ValidateTriState', and applied to each flag independently. -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](branch-1.3.x) KUDU-1964. security: avoid calling ERR clear error() defensively
Jean-Daniel Cryans has submitted this change and it was merged. Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively .. KUDU-1964. security: avoid calling ERR_clear_error() defensively This changes our various code which wraps OpenSSL to avoid calling ERR_clear_error() defensively. Instead, we now make sure to call ERR_clear_error() after any case where we receive an error return value from an OpenSSL library call. For cases where we use the existing wrapper macros like OPENSSL_RET_NOT_OK we are already ensured of this since the GetOpenSSLErrors() call clears the error queue. This provides a performance boost, since ERR_clear_error() ends up taking several library-wide mutexes in OpenSSL 1.0.x (apparently improved in OpenSSL 1.1, but that's not available on current OSes). To ensure that we don't accidentally leave an OpenSSL error lying around after any functions, I added a scoped helper which is active in debug builds. This performs a DCHECK that the error queue is empty on scope entry and exit. To benchmark, I tested rpc-bench with encryption enabled using OpenSSL 1.0.1e (RHEl6): Master, SSL off: I0404 12:49:04.811158 7250 rpc-bench.cc:84] Mode:Async I0404 12:49:04.811172 7250 rpc-bench.cc:88] Client reactors: 16 I0404 12:49:04.811175 7250 rpc-bench.cc:89] Call concurrency: 60 I0404 12:49:04.811178 7250 rpc-bench.cc:92] Worker threads: 1 I0404 12:49:04.811182 7250 rpc-bench.cc:93] Server reactors: 4 I0404 12:49:04.811183 7250 rpc-bench.cc:94] -- I0404 12:49:04.811187 7250 rpc-bench.cc:95] Reqs/sec: 446998 I0404 12:49:04.811193 7250 rpc-bench.cc:96] User CPU per req: 9.97575us I0404 12:49:04.811197 7250 rpc-bench.cc:97] Sys CPU per req: 12.2342us I0404 12:49:04.811202 7250 rpc-bench.cc:98] Ctx Sw. per req: 0.604032 Master, SSL on: -- I0404 12:57:10.241720 9949 rpc-bench.cc:86] Mode:Async I0404 12:57:10.241736 9949 rpc-bench.cc:90] Client reactors: 16 I0404 12:57:10.241739 9949 rpc-bench.cc:91] Call concurrency: 60 I0404 12:57:10.241742 9949 rpc-bench.cc:94] Worker threads: 1 I0404 12:57:10.241745 9949 rpc-bench.cc:95] Server reactors: 4 I0404 12:57:10.241747 9949 rpc-bench.cc:96] Encryption: 1 I0404 12:57:10.241760 9949 rpc-bench.cc:97] -- I0404 12:57:10.241762 9949 rpc-bench.cc:98] Reqs/sec: 56761.3 I0404 12:57:10.241778 9949 rpc-bench.cc:99] User CPU per req: 39.7792us I0404 12:57:10.241783 9949 rpc-bench.cc:100] Sys CPU per req: 106.916us I0404 12:57:10.241786 9949 rpc-bench.cc:101] Ctx Sw. per req: 5.98631 Note the high number of context switches and system CPU. I gathered stack traces of context switches using "perf record -g -e cs": Overhead SamplesCommand Shared Object Symbol . . . 100.00%180979 rpc-bench [kernel.kallsyms] [k] perf_event_task_sched_out | --- perf_event_task_sched_out schedule | |--83.17%-- futex_wait_queue_me | futex_wait | do_futex | sys_futex | system_call_fastpath | | | |--83.11%-- __lll_lock_wait | | | | | |--42.33%-- int_thread_get | | | | | |--29.36%-- CRYPTO_add_lock | | | | | |--28.28%-- int_thread_get_item | | --0.03%-- [...] | | | |--16.89%-- pthread_cond_wait@@GLIBC_2.3.2 | | | | | |--100.00%-- kudu::rpc::ServicePool::RunThread() | | | kudu::Thread::SuperviseThread(void*) | | | start_thread | | --0.00%-- [...] | --0.01%-- [...] | |--16.80%-- schedule_hrtimeout_range | | | |--100.00%-- ep_poll | | sys_epoll_wait | | system_call_fastpath | | epoll_wait | | | | | --100.00%-- ev_run | | kudu::rpc::ReactorThread::RunThread() | | kudu::Thread::SuperviseThread(void*) | | start_thread |
[kudu-CR] Rename TabletPeer to TabletReplica
Todd Lipcon has posted comments on this change. Change subject: Rename TabletPeer to TabletReplica .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] docs: Shorten top-level headings in Troubleshooting Guide
Mike Percy has submitted this change and it was merged. Change subject: docs: Shorten top-level headings in Troubleshooting Guide .. docs: Shorten top-level headings in Troubleshooting Guide The top level headings in the Troubleshooting Guide wrap in the table of contents on the web site because they are too long. Let's shorten those headings so they are easier to read in the TOC. Change-Id: Iccd0daf7954f79760a20f1fd281c3d167114a063 Reviewed-on: http://gerrit.cloudera.org:8080/6681 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon--- M docs/troubleshooting.adoc 1 file changed, 5 insertions(+), 3 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Iccd0daf7954f79760a20f1fd281c3d167114a063 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: KUDU-1949. Maintenance Manager should trigger flushes earlier
Adar Dembo has posted comments on this change. Change subject: WIP: KUDU-1949. Maintenance Manager should trigger flushes earlier .. Patch Set 1: If you're already going to run some benchmarks, could you roll into this change the fix to flush_threshold_mb calculation? -- To view, visit http://gerrit.cloudera.org:8080/6802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8b7e703c82ac14fbce3a699bdf6a2f0fb4ed93a1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] docs: Add breakpad documentation to user guide
Todd Lipcon has submitted this change and it was merged. Change subject: docs: Add breakpad documentation to user guide .. docs: Add breakpad documentation to user guide Change-Id: I05275aab0196dbf8fc37da320a0bf34662606a14 Reviewed-on: http://gerrit.cloudera.org:8080/6504 Tested-by: Kudu Jenkins Reviewed-by: Alexey SerbinReviewed-by: Todd Lipcon --- M docs/troubleshooting.adoc 1 file changed, 50 insertions(+), 0 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Alexey Serbin: Looks good to me, but someone else must approve Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6504 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I05275aab0196dbf8fc37da320a0bf34662606a14 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] docs: Add breakpad documentation to user guide
Todd Lipcon has posted comments on this change. Change subject: docs: Add breakpad documentation to user guide .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6504 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I05275aab0196dbf8fc37da320a0bf34662606a14 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] WIP: KUDU-1949. Maintenance Manager should trigger flushes earlier
Hello David Ribeiro Alves, Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/6802 to review the following change. Change subject: WIP: KUDU-1949. Maintenance Manager should trigger flushes earlier .. WIP: KUDU-1949. Maintenance Manager should trigger flushes earlier This changes the maintenance manager to start triggering flushes at the 60% heap usage threshold, and adjusts the memory limiting to only start rejecting writes at the 80% threshold. This is based on analysis in [1] ("After tweaking memory backpressure rejection") in which we found that earlier flushing could keep the system from making too many rejections. WIP: want to run before/after test again with this latest iter [1] https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit#heading=h.gw87heq4fq6l Change-Id: I8b7e703c82ac14fbce3a699bdf6a2f0fb4ed93a1 --- M src/kudu/util/maintenance_manager.cc M src/kudu/util/process_memory.cc M src/kudu/util/process_memory.h 3 files changed, 21 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/6802/1 -- To view, visit http://gerrit.cloudera.org:8080/6802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8b7e703c82ac14fbce3a699bdf6a2f0fb4ed93a1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves
[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure
Mike Percy has submitted this change and it was merged. Change subject: tablet copy: Ensure no data loss on tablet copy failure .. tablet copy: Ensure no data loss on tablet copy failure This patch adds an integration test to ensure that we don't see data loss on tablet copy failure. Tablet copy failure is triggered a couple of different ways at the source server side. This also implements the ThreadSafeRandom::NextDoubleFraction() method, which existed in the Random class but was missing from the thread-safe version. Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7 Reviewed-on: http://gerrit.cloudera.org:8080/6732 Reviewed-by: Adar DemboTested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_service.h M src/kudu/util/random.h 13 files changed, 299 insertions(+), 26 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6732 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1853 (redux). tablet copy: Don't orphan blocks on failure
Mike Percy has submitted this change and it was merged. Change subject: KUDU-1853 (redux). tablet copy: Don't orphan blocks on failure .. KUDU-1853 (redux). tablet copy: Don't orphan blocks on failure Previously, if a tablet copy failed we would orphan data blocks. This patch makes it so that a failed tablet copy operation that does not involve a process crash does not orphan data blocks. This also refactors some deletion logic out of TSTabletManager so that TabletCopyClient will tombstone partially-copied tablets when the copy operation fails. This version of the patch addresses and tests for a data loss issue (KUDU-1968) that was merged along with a previous version of this patch, and released with Kudu 1.3.0, before it was reverted in Kudu 1.3.1. Additional changes in the revised patch: * Don't check that block ids do not overlap between source and destination in tablet_copy_client-test since that's not guaranteed. * Attempt to detect data loss in tablet_copy_client-test and fail the test if detected. In the case of the log block manager, this check is currently not reliable due to KUDU-1980. * Rename old_superblock_ to remote_superblock_ in TabletCopyClientSession. Originally committed as 72541b47eb55b2df4eab5d6050f517476ed6d370 before being reverted due to KUDU-1968. Change-Id: Ic999ccd27859ace0d635255e86729dff8e1d8349 Reviewed-on: http://gerrit.cloudera.org:8080/6733 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo--- M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/tserver/tablet_copy-test-base.h M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 8 files changed, 374 insertions(+), 204 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic999ccd27859ace0d635255e86729dff8e1d8349 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [flaky tests] Address LSAN false positives
David Ribeiro Alves has submitted this change and it was merged. Change subject: [flaky tests] Address LSAN false positives .. [flaky tests] Address LSAN false positives Some tests, like external_mini_cluster-itest, are now only failing in ASAN due to leaks. Upon inspection I could only find false positives among these leaks around thread local variables. This seems related to this issue: https://github.com/google/sanitizers/issues/757 The fix is to wrap those specific instances of false positives with ScopedLeakDisabler so that they don't get reported. After many tries I was unable to come up with the right incantation to cause this in dist-tests. Running the highest failing test in ASAN with stress was not enough to cause this. Somehow this is more likely in jenkins where these failures are pretty common. Change-Id: I8c8d9ff83c0cfbc11cab213a25cbd5daa3b25869 Reviewed-on: http://gerrit.cloudera.org:8080/6320 Reviewed-by: David Ribeiro AlvesTested-by: Kudu Jenkins --- M src/kudu/util/kernel_stack_watchdog.cc M src/kudu/util/thread_restrictions.cc 2 files changed, 8 insertions(+), 0 deletions(-) Approvals: David Ribeiro Alves: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8c8d9ff83c0cfbc11cab213a25cbd5daa3b25869 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [flaky tests] Address LSAN false positives
David Ribeiro Alves has posted comments on this change. Change subject: [flaky tests] Address LSAN false positives .. Patch Set 4: Code-Review+2 just rebased, keeping the +2 -- To view, visit http://gerrit.cloudera.org:8080/6320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c8d9ff83c0cfbc11cab213a25cbd5daa3b25869 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [flaky tests] Address LSAN false positives
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6320 to look at the new patch set (#4). Change subject: [flaky tests] Address LSAN false positives .. [flaky tests] Address LSAN false positives Some tests, like external_mini_cluster-itest, are now only failing in ASAN due to leaks. Upon inspection I could only find false positives among these leaks around thread local variables. This seems related to this issue: https://github.com/google/sanitizers/issues/757 The fix is to wrap those specific instances of false positives with ScopedLeakDisabler so that they don't get reported. After many tries I was unable to come up with the right incantation to cause this in dist-tests. Running the highest failing test in ASAN with stress was not enough to cause this. Somehow this is more likely in jenkins where these failures are pretty common. Change-Id: I8c8d9ff83c0cfbc11cab213a25cbd5daa3b25869 --- M src/kudu/util/kernel_stack_watchdog.cc M src/kudu/util/thread_restrictions.cc 2 files changed, 8 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/20/6320/4 -- To view, visit http://gerrit.cloudera.org:8080/6320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c8d9ff83c0cfbc11cab213a25cbd5daa3b25869 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [flaky tests] Address LSAN false positives
David Ribeiro Alves has posted comments on this change. Change subject: [flaky tests] Address LSAN false positives .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6320/3/src/kudu/util/striped64.cc File src/kudu/util/striped64.cc: Line 144: INIT_STATIC_THREAD_LOCAL(HashCode, hashcode_); > I just removed this yesterday, looks like you'll need a rebase Done -- To view, visit http://gerrit.cloudera.org:8080/6320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c8d9ff83c0cfbc11cab213a25cbd5daa3b25869 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] env: add WriteV() API
Adar Dembo has submitted this change and it was merged. Change subject: env: add WriteV() API .. env: add WriteV() API Adds WriteV() methods to RWFile and WritableFile that allows writing multiple data Slices in one call. The implementation leverages the pwritev system call when possible and simulates it with pwrite calls when unavailable. Additionally adds WriteV()/AppendV() methods to the block manager abstraction. These methods will be used in KUDU-463 to support writing checksums and block data in a single call. Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 Reviewed-on: http://gerrit.cloudera.org:8080/6800 Reviewed-by: Adar DemboTested-by: Adar Dembo --- M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/consensus/log_util.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager.cc M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/file_cache.cc 10 files changed, 200 insertions(+), 133 deletions(-) Approvals: Adar Dembo: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/6800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Tidy Bot
[kudu-CR] Silence clang -Waddress-of-packed-member warning
David Ribeiro Alves has submitted this change and it was merged. Change subject: Silence clang -Waddress-of-packed-member warning .. Silence clang -Waddress-of-packed-member warning macOS/clang 4.0 builds have become polluted with warnings like: In file included from ../../src/kudu/tablet/deltamemstore.cc:22: In file included from ../../src/kudu/tablet/deltafile.h:34: In file included from ../../src/kudu/tablet/deltamemstore.h:33: ../../src/kudu/tablet/concurrent_btree.h:386:33: warning: taking address of packed member 'version_' of class or structure 'kudu::tablet::btree::NodeBase' may result in an unaligned pointer value [-Waddress-of-packed-member] VersionField::SetInserting(_); ^~~~ ../../src/kudu/tablet/concurrent_btree.h:721:11: note: in instantiation of member function 'kudu::tablet::btree::NodeBase::SetInserting' requested here this->SetInserting(); ^ ../../src/kudu/tablet/concurrent_btree.h:707:12: note: in instantiation of member function 'kudu::tablet::btree::LeafNode::InsertNew' requested here return InsertNew(mut->idx(), mut->key(), val, mut->arena()); ^ ../../src/kudu/tablet/concurrent_btree.h:1316:19: note: in instantiation of member function 'kudu::tablet::btree::LeafNode::Insert' requested here switch (node->Insert(mutation, val)) { ^ ../../src/kudu/tablet/concurrent_btree.h:869:19: note: in instantiation of member function 'kudu::tablet::btree::CBTree::Insert' requested here return tree_->Insert(this, val); ^ ../../src/kudu/tablet/deltamemstore.cc:103:31: note: in instantiation of member function 'kudu::tablet::btree::PreparedMutation::Insert' requested here if (PREDICT_FALSE(!mutation.Insert(update.slice( { This is because of clang's -Waddress-of-packed-member warning which has been committed and reverted and committed again in the clang codebase. Chromium also silenced this (multiple times), see: https://bugs.chromium.org/p/chromium/issues/detail?id=637306 This patch makes clang ignore that warning, but if we just left it at that older versions of clang would then complain about the unknown warning, so this also makes clang ignore unknown warnings. The alternative would be have clang ignore the warning only for certain versions, but then we'd have to do that separately for macOS and linux which seems more code to add and maintain. Note that this warning can't be ignored locally, with pragmas, because then GCC would spew warnings about unknown pragmas. Moreover the GCC warnings themselves can't be ignored because of a long standing bug. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53431 Change-Id: I03518bcca43ea6c891a7f622972465c1ea87ce63 Reviewed-on: http://gerrit.cloudera.org:8080/6693 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon--- M CMakeLists.txt 1 file changed, 7 insertions(+), 0 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I03518bcca43ea6c891a7f622972465c1ea87ce63 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Andrew Wong has posted comments on this change. Change subject: KUDU-1952 Remove round-robin for block placement .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/6636/15/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: Line 132: // hierarchy, ignoring '.', '..', and the file specified by 'instance_name'. will update this comment -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1853 (redux). tablet copy: Don't orphan blocks on failure
Adar Dembo has posted comments on this change. Change subject: KUDU-1853 (redux). tablet copy: Don't orphan blocks on failure .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic999ccd27859ace0d635255e86729dff8e1d8349 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] env: add WriteV() API
Adar Dembo has posted comments on this change. Change subject: env: add WriteV() API .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] Fix bug in incorrect response rebuilding on tablet bootstrap
Todd Lipcon has posted comments on this change. Change subject: Fix bug in incorrect response rebuilding on tablet bootstrap .. Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/5489/4/src/kudu/tablet/row_op.cc File src/kudu/tablet/row_op.cc: Line 56: DCHECK(!this->result) << this->result->DebugString(); think we should leave this with SecureDebugString http://gerrit.cloudera.org:8080/#/c/5489/4/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: Line 251: // Plays operations, skipping those that have already been flushed. can you clarify how it knows what is already flushed here? PS4, Line 256: result update Line 259: TxResultPB* new_result, update docs to explain what it's filling in in new_result PS4, Line 1440: ShortDebugString should use Secure form PS4, Line 1459: const OperationResultPB& new_op_result = new_result->ops(curr_op_idx); this is only used down below inside the if block on L1491, right? maybe move the decl there? PS4, Line 1491: flushed should we rename this PB field to 'skip_on_replay'? I think it would be more accurate as to its purpose PS4, Line 1519: already_flushed also should maybe rename this field to "skip_replay"? One thought (not sure if it would work): given that if previously_failed is true, then this would also always be true, could we combine the two booleans into an enum of NEEDS_REPLAY, SKIP_PREVIOUSLY_FAILED, and SKIP_PREVIOUSLY_FLUSHED or something? -- To view, visit http://gerrit.cloudera.org:8080/5489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1219ed5f7835e93cd7f3b128cedd650fc3384482 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] env: add WriteV() API
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6800 to look at the new patch set (#4). Change subject: env: add WriteV() API .. env: add WriteV() API Adds WriteV() methods to RWFile and WritableFile that allows writing multiple data Slices in one call. The implementation leverages the pwritev system call when possible and simulates it with pwrite calls when unavailable. Additionally adds WriteV()/AppendV() methods to the block manager abstraction. These methods will be used in KUDU-463 to support writing checksums and block data in a single call. Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 --- M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/consensus/log_util.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager.cc M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/file_cache.cc 10 files changed, 200 insertions(+), 133 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/6800/4 -- To view, visit http://gerrit.cloudera.org:8080/6800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Andrew Wong has posted comments on this change. Change subject: KUDU-1952 Remove round-robin for block placement .. Patch Set 14: (27 comments) I moved the definition of DataDirGroup back to the .h since it's used by one of the private members. Also added data_dirs-test.cc http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/block_manager-stress-test.cc File src/kudu/fs/block_manager-stress-test.cc: PS14, Line 433: "test_tablet" > Define this once in the test fixture and use it in both places. Done http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: Line 118: CHECK_OK(bm_->dd_manager()->LoadDataDirGroupFromPB("test_tablet", test_group_pb_)); > Why not RETURN_NOT_OK here too? Done Line 126: void RunBlockDistributionTest(const vector& paths); > Maybe add some coverage for DeleteDataDirGroup too? Done Line 141: ASSERT_STR_CONTAINS(s.ToString(), "DataDirGroup not found for tablet"); > Create a separate test for this; don't overload setUp() with tests. Done Line 155: int count_files(Env* env, const string& path, const string& instance_name) { > Should be CountFiles(). Also, perhaps make it a member of BlockManagerTest Done PS14, Line 171: instance_name > Isn't this always "block_manager_instance"? If so, can you hardcode it? Done PS14, Line 190: CHECK_OK > ASSERT_OK Done PS14, Line 228: CHECK_OK > ASSERT_OK Done Line 284: CHECK_OK(bm_->dd_manager()->CreateDataDirGroup("multipath_test")); > ASSERT_OK Done PS14, Line 313: CHECK_OK > ASSERT_OK Done PS14, Line 385: CreateBlockOptions({ "test_tablet" }) > Maybe you can define this once as a BlockManagerTest member and refer to it Done http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: PS14, Line 160: A group is represented by a list of indices within the list of : // all UUIDs found in the PathSetPB. > Maybe "A group is represented by a list of 2-byte indices, which index into Done Line 478: tablets_by_uuid_idx_map_[uuid_idx].insert(tablet_id); > Should this use InsertOrDie(), to emphasize that there's no reason the set Done Line 528: for (int16_t uuid_index : group->uuid_index_list()) { > Should be uint16_t Done Line 529: tablets_by_uuid_idx_map_[uuid_index].erase(tablet_id); > Should we use FindOrDie() to get the set of tablets out, since it'd be an e Done Line 573: candidate = data_dir_by_uuid_idx_[dir_uuids[iter1]]; > FindOrDie(). L580 too. Done Line 590: tablets_by_uuid_idx_map_[uuid1].size() > tablets_by_uuid_idx_map_[uuid2].size()) { > Use FindOrDie() here too. Done http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: Line 27: #include "kudu/fs/fs.pb.h" > Do you still need this? Maybe you can get away with forward-declaring DataD I can and Done PS14, Line 162: Adds data directories to a specific tablet's dir group, until the limit : // specified by fs_target_data_dirs_per_tablet, or until there is no more space. > Yes, but we should also emphasize that this also creates a dir group for th Done PS14, Line 170: tablet > Nit: tablet's dir group. Done PS14, Line 171: tablet will be untracked. > Nit: "tablet's dir group will be deleted" (to emphasize the symmetry with D Done PS14, Line 174: tracking > Nit: since this method is no longer UntrackTablet, I think it'd be clearer Done Line 221: bool GetDirForGroupUnlocked(const std::vector& group, uint16_t* uuid_idx); > Maybe the parameters would be more clear as 'uuid_idxes' and 'new_uuid_idx' Hrm, I changed it to 'group_indices' and 'new_uuid_index', hopefully that's clearer. http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/fs.proto File src/kudu/fs/fs.proto: PS14, Line 117: indexes > Nit: indices or indexes? Pick one and apply it consistently here (and where Done. http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: PS14, Line 67: "test_tablet" > Define just once. Done PS14, Line 247: CreateBlockOptions({ "test_tablet" } > Maybe define just once and use repeatedly? Done http://gerrit.cloudera.org:8080/#/c/6636/14/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: Line 96: fs_manager->dd_manager()->DeleteDataDirGroup(tablet_id); > So a convention we like to use in these situations is to wrap the cleanup t That is _neat_ Done -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6636 to look at the new patch set (#15). Change subject: KUDU-1952 Remove round-robin for block placement .. KUDU-1952 Remove round-robin for block placement This is the first of a multi-patch patchset to mitigate the single-disk failure. Throughout the code, the term "DataDir" refers to a data directory, which is often mounted on a distinct disk. Thus, "disks" and "data directories" will be used interchangeably. This patch adds a mapping from tablet to a set of disks and uses it to replace the existing round-robin placement of blocks. Tablets are mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks are placed randomly in directories within each tablet's DataDirGroup. Tablet-to-group mappings are generated and stored as metadata upon tablet creation, or upon tablet replacement during a tablet copy. During group creation, disks are added to groups by randomly selecting two available directories and selecting the one with fewer tablets on it. This avoids pigeonholing new tablets to disks with relatively few tablets, while still trending towards filling underloaded disks. Groups are maintained when restarting the server, as they are flushed with metadata, and are deleted upon tablet deletion. When loading tablet data from a previous version of Kudu, the tablet's superblock will not have a DataDirGroup. One will be generated containing all data directories, as the tablet's data may already be spread across any number of disks. As this patch only addresses block placement, it does not itself mitigate single-disk failure. Given this, and given the tradeoff between I/O and failure disk-failure tolerance, the default behavior will be to spread tablet data across all disks. Testing is done at the block manager level in block_manager-test and log_block_manager-test, as well as in the new data_dirs-test. A design doc can be found here: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b --- M src/kudu/cfile/bloomfile-test-base.h M src/kudu/cfile/cfile-test-base.h M src/kudu/cfile/cfile-test.cc M src/kudu/fs/CMakeLists.txt M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h A src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/delta_compaction-test.cc M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 37 files changed, 1,000 insertions(+), 175 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/15 -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Test for bug in exactly-once during tablet bootstrap
Todd Lipcon has posted comments on this change. Change subject: Test for bug in exactly-once during tablet bootstrap .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Test for bug in exactly-once during tablet bootstrap
Todd Lipcon has submitted this change and it was merged. Change subject: Test for bug in exactly-once during tablet bootstrap .. Test for bug in exactly-once during tablet bootstrap Here's a regression test for the bug which is causing raft_consensus-itest to occasionally think it has inserted 23 rows when in fact it has only inserted 20. The issue is in the rewriting of logs during bootstrap. If we do a write which gets a duplicate key error, the first time the COMMIT message is written, it includes the error. When the server restarts, it writes the COMMIT message again with only 'flushed: true' in the commit message. This is enough for bootstrap to know not to bother to replay it on subsequent restarts, but it has lost the error messages themselves. If the server restarts again, at this point it doesn't rebuild a proper response, but instead puts an errorless response into the ResultTracker. So, if an operation hits an error, and then the tablet server restarts twice while the client is still retrying, the client will falsely think that its operation has succeeded. This includes a disabled regression test which shows the bug. Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Reviewed-on: http://gerrit.cloudera.org:8080/5417 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon--- M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server-test.cc 2 files changed, 64 insertions(+), 2 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I60b3b30b0705b4f9063b0d505cb9ab1ca24e470a Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Rename TabletPeer to TabletReplica
Mike Percy has posted comments on this change. Change subject: Rename TabletPeer to TabletReplica .. Patch Set 1: I am not planning on fixing these tidy nits because this is mostly an automated patch and these are all existing tidiness complaints. -- To view, visit http://gerrit.cloudera.org:8080/6794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icafecd47268c395a6c0ce0aed75d154f4ace7192 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] Silence clang -Waddress-of-packed-member warning
Todd Lipcon has posted comments on this change. Change subject: Silence clang -Waddress-of-packed-member warning .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03518bcca43ea6c891a7f622972465c1ea87ce63 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [flaky tests] Address LSAN false positives
Todd Lipcon has posted comments on this change. Change subject: [flaky tests] Address LSAN false positives .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6320/3/src/kudu/util/striped64.cc File src/kudu/util/striped64.cc: Line 144: INIT_STATIC_THREAD_LOCAL(HashCode, hashcode_); I just removed this yesterday, looks like you'll need a rebase -- To view, visit http://gerrit.cloudera.org:8080/6320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c8d9ff83c0cfbc11cab213a25cbd5daa3b25869 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1970: node density integration test
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1970: node density integration test .. Patch Set 10: (4 comments) http://gerrit.cloudera.org:8080/#/c/6662/9/src/kudu/integration-tests/dense_node-itest.cc File src/kudu/integration-tests/dense_node-itest.cc: PS9, Line 71: > I'll add some documentation. right, most are benchmarks among other tests or actual tests that test for correctness, which this one doesn't do. maybe we should establish a new precendent? it'd be nice to have clear test and file names for these kinds of tests. PS9, Line 100: Substitute("--log_container_max > The goal isn't to mirror a real deployment; it's to get the MM to flush: sounds reasonable PS9, Line 131: // With the amount of data we're going to write, we need to make sure the : // tserver has enough time to start back up (startup is only considered to be : / > But the goal of the test is to maximize metadata, which means smaller colum k, thats fair http://gerrit.cloudera.org:8080/#/c/6662/10/src/kudu/integration-tests/test_workload.cc File src/kudu/integration-tests/test_workload.cc: PS10, Line 233: // Do some sanity checks on the schema. They reflect how the rest of : // TestWorkload is going to use the schema. : CHECK_GT(schema_.num_columns(), 0) << "Schema should have at least one column"; : vector key_indexes; : schema_.GetPrimaryKeyColumnIndexes(_indexes); : CHECK_EQ(1, key_indexes.size()) << "Schema should have just one key column"; : CHECK_EQ(0, key_indexes[0]) << "Schema's key column should be index 0"; : KuduColumnSchema key = schema_.Column(0); : CHECK_EQ("key", key.name()) << "Schema column should be named 'key'"; : CHECK_EQ(KuduColumnSchema::INT32, key.type()) << "Schema key column should be of type INT32"; why not do this when the schema is set? we know that the simple schema abides by these rules. -- To view, visit http://gerrit.cloudera.org:8080/6662 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9b5d01557eb41d386ce92f576ed01ec658e8e7d Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] env: add WriteV() API
Adar Dembo has posted comments on this change. Change subject: env: add WriteV() API .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6800/3/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: Line 120: virtual Status AppendV(const vector& data) = 0; This is in a header file, so you should keep the std:: prefix. -- To view, visit http://gerrit.cloudera.org:8080/6800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] env: add WriteV() API
Grant Henke has posted comments on this change. Change subject: env: add WriteV() API .. Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 478: } > Any particular reason to preserve this? It's a private function; if it's no Done Line 480: Status CFileWriter::WriteRawDataV(const vector ) { > vector& Done http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/cfile/cfile_writer.h File src/kudu/cfile/cfile_writer.h: Line 189: Status WriteRawDataV(const vector ); > Should be vector& (& next to the type). Done http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1063: virtual Status AppendV(const std::vector& data) OVERRIDE; > Don't need std:: prefix here. Done http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/util/env-test.cc File src/kudu/util/env-test.cc: Line 163: // Force short reads to half the slice length > You mean writes? Done http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/util/env.h File src/kudu/util/env.h: Line 448: virtual Status AppendV(const std::vector ) = 0; > std::vector& Done Line 543: virtual Status WriteV(uint64_t offset, const vector ) = 0; > Need std:: prefix here. Also vector& Done http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/util/env_posix.cc File src/kudu/util/env_posix.cc: Line 408: // Convert the results into the iovec vector to request > Nit: terminate comments that are English sentences with a period. Done Line 571: Status s = DoWriteV(fd_, filename_, filesize_, data); > Hmm, this seems weird; shouldn't we RETURN_NOT_OK() and not touch filesize_ Done -- To view, visit http://gerrit.cloudera.org:8080/6800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] env: add WriteV() API
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6800 to look at the new patch set (#3). Change subject: env: add WriteV() API .. env: add WriteV() API Adds WriteV() methods to RWFile and WritableFile that allows writing multiple data Slices in one call. The implementation leverages the pwritev system call when possible and simulates it with pwrite calls when unavailable. Additionally adds WriteV()/AppendV() methods to the block manager abstraction. These methods will be used in KUDU-463 to support writing checksums and block data in a single call. Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 --- M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/consensus/log_util.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager.cc M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/file_cache.cc 10 files changed, 200 insertions(+), 133 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/6800/3 -- To view, visit http://gerrit.cloudera.org:8080/6800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR](branch-1.3.x) KUDU-1964. security: avoid calling ERR clear error() defensively
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/6801 Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively .. KUDU-1964. security: avoid calling ERR_clear_error() defensively This changes our various code which wraps OpenSSL to avoid calling ERR_clear_error() defensively. Instead, we now make sure to call ERR_clear_error() after any case where we receive an error return value from an OpenSSL library call. For cases where we use the existing wrapper macros like OPENSSL_RET_NOT_OK we are already ensured of this since the GetOpenSSLErrors() call clears the error queue. This provides a performance boost, since ERR_clear_error() ends up taking several library-wide mutexes in OpenSSL 1.0.x (apparently improved in OpenSSL 1.1, but that's not available on current OSes). To ensure that we don't accidentally leave an OpenSSL error lying around after any functions, I added a scoped helper which is active in debug builds. This performs a DCHECK that the error queue is empty on scope entry and exit. To benchmark, I tested rpc-bench with encryption enabled using OpenSSL 1.0.1e (RHEl6): Master, SSL off: I0404 12:49:04.811158 7250 rpc-bench.cc:84] Mode:Async I0404 12:49:04.811172 7250 rpc-bench.cc:88] Client reactors: 16 I0404 12:49:04.811175 7250 rpc-bench.cc:89] Call concurrency: 60 I0404 12:49:04.811178 7250 rpc-bench.cc:92] Worker threads: 1 I0404 12:49:04.811182 7250 rpc-bench.cc:93] Server reactors: 4 I0404 12:49:04.811183 7250 rpc-bench.cc:94] -- I0404 12:49:04.811187 7250 rpc-bench.cc:95] Reqs/sec: 446998 I0404 12:49:04.811193 7250 rpc-bench.cc:96] User CPU per req: 9.97575us I0404 12:49:04.811197 7250 rpc-bench.cc:97] Sys CPU per req: 12.2342us I0404 12:49:04.811202 7250 rpc-bench.cc:98] Ctx Sw. per req: 0.604032 Master, SSL on: -- I0404 12:57:10.241720 9949 rpc-bench.cc:86] Mode:Async I0404 12:57:10.241736 9949 rpc-bench.cc:90] Client reactors: 16 I0404 12:57:10.241739 9949 rpc-bench.cc:91] Call concurrency: 60 I0404 12:57:10.241742 9949 rpc-bench.cc:94] Worker threads: 1 I0404 12:57:10.241745 9949 rpc-bench.cc:95] Server reactors: 4 I0404 12:57:10.241747 9949 rpc-bench.cc:96] Encryption: 1 I0404 12:57:10.241760 9949 rpc-bench.cc:97] -- I0404 12:57:10.241762 9949 rpc-bench.cc:98] Reqs/sec: 56761.3 I0404 12:57:10.241778 9949 rpc-bench.cc:99] User CPU per req: 39.7792us I0404 12:57:10.241783 9949 rpc-bench.cc:100] Sys CPU per req: 106.916us I0404 12:57:10.241786 9949 rpc-bench.cc:101] Ctx Sw. per req: 5.98631 Note the high number of context switches and system CPU. I gathered stack traces of context switches using "perf record -g -e cs": Overhead SamplesCommand Shared Object Symbol . . . 100.00%180979 rpc-bench [kernel.kallsyms] [k] perf_event_task_sched_out | --- perf_event_task_sched_out schedule | |--83.17%-- futex_wait_queue_me | futex_wait | do_futex | sys_futex | system_call_fastpath | | | |--83.11%-- __lll_lock_wait | | | | | |--42.33%-- int_thread_get | | | | | |--29.36%-- CRYPTO_add_lock | | | | | |--28.28%-- int_thread_get_item | | --0.03%-- [...] | | | |--16.89%-- pthread_cond_wait@@GLIBC_2.3.2 | | | | | |--100.00%-- kudu::rpc::ServicePool::RunThread() | | | kudu::Thread::SuperviseThread(void*) | | | start_thread | | --0.00%-- [...] | --0.01%-- [...] | |--16.80%-- schedule_hrtimeout_range | | | |--100.00%-- ep_poll | | sys_epoll_wait | | system_call_fastpath | | epoll_wait | | | | | --100.00%-- ev_run | | kudu::rpc::ReactorThread::RunThread() | | kudu::Thread::SuperviseThread(void*) | |
[kudu-CR] env: add WriteV() API
Adar Dembo has posted comments on this change. Change subject: env: add WriteV() API .. Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: PS1, Line 476: Status CFileWriter::WriteRawData(const Slice& data) { : return WriteRawDataV({ data }); : } Any particular reason to preserve this? It's a private function; if it's not being used anymore, let's remove it. PS1, Line 480: vector & vector& http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/cfile/cfile_writer.h File src/kudu/cfile/cfile_writer.h: PS1, Line 189: vector & Should be vector& (& next to the type). http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: PS1, Line 229: std:: Don't need. http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS1, Line 1063: std:: Don't need std:: prefix here. http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/util/env-test.cc File src/kudu/util/env-test.cc: PS1, Line 163: reads You mean writes? http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/util/env.h File src/kudu/util/env.h: PS1, Line 448: & std::vector& PS1, Line 543: vector Need std:: prefix here. Also vector& http://gerrit.cloudera.org:8080/#/c/6800/1/src/kudu/util/env_posix.cc File src/kudu/util/env_posix.cc: Line 408: // Convert the results into the iovec vector to request Nit: terminate comments that are English sentences with a period. Line 571: Status s = DoWriteV(fd_, filename_, filesize_, data); Hmm, this seems weird; shouldn't we RETURN_NOT_OK() and not touch filesize_ or pending_sync_ on failure? -- To view, visit http://gerrit.cloudera.org:8080/6800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] env: add WriteV() API
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6800 to look at the new patch set (#2). Change subject: env: add WriteV() API .. env: add WriteV() API Adds WriteV() methods to RWFile and WritableFile that allows writing multiple data Slices in one call. The implementation leverages the pwritev system call when possible and simulates it with pwrite calls when unavailable. Additionally adds WriteV()/AppendV() methods to the block manager abstraction. These methods will be used in KUDU-463 to support writing checksums and block data in a single call. Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 --- M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/consensus/log_util.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager.cc M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/file_cache.cc 10 files changed, 202 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/6800/2 -- To view, visit http://gerrit.cloudera.org:8080/6800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [util] introduced custom gflags validators
Adar Dembo has posted comments on this change. Change subject: [util] introduced custom gflags validators .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/6795/3//COMMIT_MSG Commit Message: Line 19: This patch addresses the following JIRA item: Any particular reason your first line isn't just "KUDU-1993: introduced custom gflag validators" instead of including this additional paragraph? http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/rpc/messenger.cc File src/kudu/rpc/messenger.cc: Line 142: DEFINE_validator(rpc_encryption, ); Could we also add a DEFINE_validator() for rpc_authentication? I know it's semi-redundant with the CUSTOM_FLAG_VALIDATOR below, but I think it would be more readable if the validation were logically separate. That is: 1. Two regular validators, each enforcing that either rpc_authentication or rpc_encryption contain a legal value 2. A group validator that assumes #1 is good and checks the combination of the values. http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/util/flag_validators-test.cc File src/kudu/util/flag_validators-test.cc: Line 71: class FlagsValidatorsDeathTest : public KuduTest { Can you explain why ASSERT_DEATH didn't work for this use case? http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/util/flag_validators.cc File src/kudu/util/flag_validators.cc: PS3, Line 31: check What does "check" mean in this context? Line 40: CHECK(validators_.emplace(name, func).second); Can we use InsertOrDie() here instead? http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/util/flag_validators.h File src/kudu/util/flag_validators.h: Line 30: Perhaps we should refer to these as "group" validators instead of "custom" validators? "Custom" is somewhat nebulous (especially since you can construe a regular validator as being "custom" given that it runs a custom function); "group" emphasizes that it's useful for validating groups of gflags. Line 31: #define CUSTOM_FLAG_VALIDATOR(name, func) \ Add a comment with a sample usage. Line 45: class Registrator { Doc class and constructor briefly. http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/util/flags.cc File src/kudu/util/flags.cc: Line 415: if (!e.second()) { I think we should allow all custom validators to run before exiting. We needn't necessarily log which validators failed; it's expected that individual validators LOG something before returning false. http://gerrit.cloudera.org:8080/#/c/6795/3/src/kudu/util/test_main.cc File src/kudu/util/test_main.cc: Line 45: // The scope leak check disablers are to supress LSAN warnings in death tests. I'd like to better understand this; can you add something to the commit description explaining this? -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] env: add WriteV() API
Grant Henke has uploaded a new change for review. http://gerrit.cloudera.org:8080/6800 Change subject: env: add WriteV() API .. env: add WriteV() API Adds WriteV() methods to RWFile and WritableFile that allows writing multiple data Slices in one call. The implementation leverages the pwritev system call when possible and simulates it with pwrite calls when unavailable. Additionally adds WriteV()/AppendV() methods to the block manager abstraction. These methods will be used in KUDU-463 to support writing checksums and block data in a single call. Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 --- M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/consensus/log_util.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager.cc M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/file_cache.cc 10 files changed, 202 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/6800/1 -- To view, visit http://gerrit.cloudera.org:8080/6800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I30acfa2e4918ef945c55646647913b36a07daaa4 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke
[kudu-CR] spark: add support for fault tolerant scanner
Hao Hao has posted comments on this change. Change subject: spark: add support for fault tolerant scanner .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/6782/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala: PS1, Line 100: .getOrElse(KUDU > I think doing something like "Try(parameters.getOrElse(FAULT_TOLERANT_SCANN Done http://gerrit.cloudera.org:8080/#/c/6782/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: Line 43: class KuduContext(kuduMaster: String) extends Serializable { > I think setting this on the context is too restrictive, it means you'd have Done -- To view, visit http://gerrit.cloudera.org:8080/6782 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3f3192025ca5d74197600480fd3d040d70b4bbc2 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] spark: add support for fault tolerant scanner
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6782 to look at the new patch set (#5). Change subject: spark: add support for fault tolerant scanner .. spark: add support for fault tolerant scanner This adds support to use fault tolerant scanner for spark job. By default non fault tolerant scanner is used. To turn on fault tolerant scanner, use job config: 'kudu.fault.tolerant.scan'. Change-Id: I3f3192025ca5d74197600480fd3d040d70b4bbc2 --- M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduContextTest.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala 5 files changed, 26 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6782/5 -- To view, visit http://gerrit.cloudera.org:8080/6782 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f3192025ca5d74197600480fd3d040d70b4bbc2 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [flaky tests] Address LSAN false positives
David Ribeiro Alves has posted comments on this change. Change subject: [flaky tests] Address LSAN false positives .. Patch Set 3: As stated in the commit message I couldn't repro this on dist-test after many tries. Still think that this is worth merging since this failure is still pretty common -- To view, visit http://gerrit.cloudera.org:8080/6320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c8d9ff83c0cfbc11cab213a25cbd5daa3b25869 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [flaky tests] Address LSAN false positives
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6320 to look at the new patch set (#3). Change subject: [flaky tests] Address LSAN false positives .. [flaky tests] Address LSAN false positives Some tests, like external_mini_cluster-itest, are now only failing in ASAN due to leaks. Upon inspection I could only find false positives among these leaks around thread local variables. This seems related to this issue: https://github.com/google/sanitizers/issues/757 The fix is to wrap those specific instances of false positives with ScopedLeakDisabler so that they don't get reported. After many tries I was unable to come up with the right incantation to cause this in dist-tests. Running the highest failing test in ASAN with stress was not enough to cause this. Somehow this is more likely in jenkins where these failures are pretty common. Change-Id: I8c8d9ff83c0cfbc11cab213a25cbd5daa3b25869 --- M src/kudu/util/kernel_stack_watchdog.cc M src/kudu/util/striped64.cc M src/kudu/util/thread_restrictions.cc 3 files changed, 15 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/20/6320/3 -- To view, visit http://gerrit.cloudera.org:8080/6320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c8d9ff83c0cfbc11cab213a25cbd5daa3b25869 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [util] introduced custom gflags validators
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6795 to look at the new patch set (#3). Change subject: [util] introduced custom gflags validators .. [util] introduced custom gflags validators Use CUSTOM_FLAG_VALIDATOR() macro to register late-phase validator function for flags. The validation is performed upon call of HandleCommonFlags() invoked by the top-level ParseCommandLineFlags() function. Added unit test to cover the new functionality. Updated validators for security-related RPC and embedded webserver flags. This patch addresses the following JIRA item: KUDU-1993: The validation of 'grouped' flags works incorrectly if flags re-ordered Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 --- M src/kudu/rpc/messenger.cc M src/kudu/server/webserver_options.cc M src/kudu/util/CMakeLists.txt A src/kudu/util/flag_validators-test.cc A src/kudu/util/flag_validators.cc A src/kudu/util/flag_validators.h M src/kudu/util/flags.cc M src/kudu/util/test_main.cc 8 files changed, 353 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/6795/3 -- To view, visit http://gerrit.cloudera.org:8080/6795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3755d62590cdc63a9d501ba69d980cb15f8069a9 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] env: unify ReadV and Read code paths
Adar Dembo has submitted this change and it was merged. Change subject: env: unify ReadV and Read code paths .. env: unify ReadV and Read code paths Reduces similar and duplicate code by directing read calls to use the ReadV implimentation with a single element vector. Change-Id: I08538225c7db0eba0e83827b4c2dd5b507491580 Reviewed-on: http://gerrit.cloudera.org:8080/6799 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo--- M src/kudu/fs/file_block_manager.cc M src/kudu/fs/fs-test-util.h M src/kudu/fs/log_block_manager.cc M src/kudu/util/env_posix.cc 4 files changed, 10 insertions(+), 76 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6799 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I08538225c7db0eba0e83827b4c2dd5b507491580 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] env: unify ReadV and Read code paths
Adar Dembo has posted comments on this change. Change subject: env: unify ReadV and Read code paths .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6799 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08538225c7db0eba0e83827b4c2dd5b507491580 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No