[kudu-CR] consensus: Remove Consensus interface
David Ribeiro Alves has posted comments on this change. Change subject: consensus: Remove Consensus interface .. Patch Set 2: Code-Review+1 (2 comments) lgtm, just found a couple of minor nits. Feel free to upgrade +2 when you fix, no need to re-review (not marking as such to make sure you see my comments) http://gerrit.cloudera.org:8080/#/c/7040/2/src/kudu/consensus/leader_election.h File src/kudu/consensus/leader_election.h: PS2, Line 26: #include "kudu/consensus/raft_consensus.h" include ordering http://gerrit.cloudera.org:8080/#/c/7040/2/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: PS2, Line 27: #include "kudu/consensus/raft_consensus.h" include ordering -- To view, visit http://gerrit.cloudera.org:8080/7040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21b976cb92619083e1f1766b13634b841b554c8c Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add a workaround for LSAN bug #757
David Ribeiro Alves has posted comments on this change. Change subject: Add a workaround for LSAN bug #757 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7052/1/src/kudu/server/generic_service.cc File src/kudu/server/generic_service.cc: PS1, Line 136: has_leaks = __lsan_do_recoverable_leak_check(); LOG(WARN) when a leak is found but we're retrying? could we print the leak itself? -- To view, visit http://gerrit.cloudera.org:8080/7052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id79e4ed7fa6311bcabdb55b8d4fff9f9a4f242bc Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-2027 retry scan RPC if negotiation times out .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/7037/2/src/kudu/client/scanner-internal.cc File src/kudu/client/scanner-internal.cc: PS2, Line 216: if (overall_deadline == deadline && overall_deadline <= MonoTime::Now()) { so basically this is making sure that even in the case where the overall_deadline is the deadline but we haven't exceeded it we still return a (retriable) ScanRpcStatus::RPC_DEADLINE_EXCEEDED error, right? I think we need to at least document that behavior reasonably well. PS2, Line 407: now + KuduClient::Data::ComputeExponentialBackoff(attempt)) - now not sure I'm following this. isn't now + KuduClient::Data::ComputeExponentialBackoff(attempt)) - now just equivalent to KuduClient::Data::ComputeExponentialBackoff(attempt))? http://gerrit.cloudera.org:8080/#/c/7037/2/src/kudu/integration-tests/client-negotiation-failover-itest.cc File src/kudu/integration-tests/client-negotiation-failover-itest.cc: PS2, Line 40: using kudu::client::ScanToStrings; ordering PS2, Line 113: other nit: not your fault but "another" -- To view, visit http://gerrit.cloudera.org:8080/7037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: Remove Consensus interface
Mike Percy has posted comments on this change. Change subject: consensus: Remove Consensus interface .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7040/2/src/kudu/consensus/leader_election.h File src/kudu/consensus/leader_election.h: PS2, Line 26: #include "kudu/consensus/raft_consensus.h" > include ordering Done http://gerrit.cloudera.org:8080/#/c/7040/2/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: PS2, Line 27: #include "kudu/consensus/raft_consensus.h" > include ordering Done -- To view, visit http://gerrit.cloudera.org:8080/7040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21b976cb92619083e1f1766b13634b841b554c8c Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: Remove Consensus interface
Hello David Ribeiro Alves, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7040 to look at the new patch set (#3). Change subject: consensus: Remove Consensus interface .. consensus: Remove Consensus interface We only have one Consensus implementation now, and have no plans for additional implementations in the future. So we can remove this interface. Change-Id: I21b976cb92619083e1f1766b13634b841b554c8c --- M src/kudu/consensus/CMakeLists.txt M src/kudu/consensus/consensus-test-util.h D src/kudu/consensus/consensus.cc D src/kudu/consensus/consensus.h M src/kudu/consensus/leader_election.h M src/kudu/consensus/pending_rounds.cc M src/kudu/consensus/pending_rounds.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/transactions/transaction.h M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h M src/kudu/tablet/transactions/write_transaction.h M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/tserver-path-handlers.cc 23 files changed, 446 insertions(+), 619 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/7040/3 -- To view, visit http://gerrit.cloudera.org:8080/7040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21b976cb92619083e1f1766b13634b841b554c8c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Add a workaround for LSAN bug #757
Todd Lipcon has posted comments on this change. Change subject: Add a workaround for LSAN bug #757 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7052/1/src/kudu/server/generic_service.cc File src/kudu/server/generic_service.cc: PS1, Line 136: has_leaks = __lsan_do_recoverable_leak_check(); > LOG(WARN) when a leak is found but we're retrying? could we print the leak lsan_do_recoverable_leak_check already does print the leak to stderr. I suppose we could log the retries but is that useful on top of also logging the leaks? -- To view, visit http://gerrit.cloudera.org:8080/7052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id79e4ed7fa6311bcabdb55b8d4fff9f9a4f242bc Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](branch-1.3.x) KUDU-2020: tserver failure causes multiple tablet copy operations per under-replicated tablet
Hello Jean-Daniel Cryans, Mike Percy, Todd Lipcon, Kudu Jenkins, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7057 to review the following change. Change subject: KUDU-2020: tserver failure causes multiple tablet copy operations per under-replicated tablet .. KUDU-2020: tserver failure causes multiple tablet copy operations per under-replicated tablet The 'active ingredient' in this patch is the change to TsTabletManager::StartTabletCopy that causes an ALREADY_INPROGRESS response to be returned if the tablet is currently being copied and the tablet copy thread pool is full. Previously an ALREADY_INPROGRESS response would only occur if the tablet was currently being copied, and the threadpool was not full. The effect of the failure to return ALREADY_INPROGRESS was that a leader would be much more likely consider a tablet server failed and to subsequently drop the replica from the Raft config. As a result, on a highly loaded cluster, a tablet copy could be started at the same time, 300 seconds apart, on many tablet servers. The remaining changes are to return more specific errors out of the tablet copy service, which aids with testing specific codepaths. One of the existing tablet_copy-itest cases has been beefed up to cover the tablet copy threadpool full path. Without the changes outlined before it fails with: ../../src/kudu/integration-tests/tablet_copy-itest.cc:961: Failure Expected: (num_inprogress) > (0), actual: 0 vs 0 which is exactly what we would expect; the tablet server is failing to return INPROGRESS errors. Anecdotally, this patch has improved TTR times 5-10x on highly loaded clusters. It's still possible for tablets to be bounced around during re-replication if the copying tablet server has a full RPC queue, or it's unable to start the tablet copy for 300 seconds, but both of these conditions indicate that it's probably best to drop that tserver and retry on a (hopefully) less stressed server. Change-Id: Iffa1f0fec4e882beabfee6e0f2672096caccdf75 Reviewed-on: http://gerrit.cloudera.org:8080/6925 Reviewed-by: Todd Lipcon Tested-by: Kudu Jenkins Reviewed-by: Mike Percy --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc 8 files changed, 157 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/7057/1 -- To view, visit http://gerrit.cloudera.org:8080/7057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iffa1f0fec4e882beabfee6e0f2672096caccdf75 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-1.3.x) KUDU-2020: tserver failure causes multiple tablet copy operations per under-replicated tablet
Dan Burkert has posted comments on this change. Change subject: KUDU-2020: tserver failure causes multiple tablet copy operations per under-replicated tablet .. Patch Set 1: Failed due to the CI protobuf upgrade issues. Retriggered. -- To view, visit http://gerrit.cloudera.org:8080/7057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iffa1f0fec4e882beabfee6e0f2672096caccdf75 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out
Alexey Serbin has posted comments on this change. Change subject: KUDU-2027 retry scan RPC if negotiation times out .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/7037/2/src/kudu/client/scanner-internal.cc File src/kudu/client/scanner-internal.cc: PS2, Line 216: if (overall_deadline == deadline && overall_deadline <= MonoTime::Now()) { > so basically this is making sure that even in the case where the overall_de Right. The issue with the original code was that even if the timeout error happened prior to that overall deadline, this code would assume the whole timeout interval has passed. I was also thinking of employing the new functionality to distinguish between negotiation timeout and the RPC timeout itself, but in that case it would be the same criteria plus that RpcController::negotiation_failed() anyway, so I decided to not involve the new RpcController::negotiation_failed(). What is the best place to document this new behavior? PS2, Line 407: now + KuduClient::Data::ComputeExponentialBackoff(attempt)) - now > not sure I'm following this. isn't now + KuduClient::Data::ComputeExponenti That's all about parentheses (i.e. braces) :) This code does the following: sleep = Earliest(deadline, now + backoff) - now; http://gerrit.cloudera.org:8080/#/c/7037/2/src/kudu/integration-tests/client-negotiation-failover-itest.cc File src/kudu/integration-tests/client-negotiation-failover-itest.cc: PS2, Line 40: using kudu::client::ScanToStrings; > ordering Done PS2, Line 113: other > nit: not your fault but "another" The original version was written by me as well :) Done. -- To view, visit http://gerrit.cloudera.org:8080/7037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7037 to look at the new patch set (#3). Change subject: KUDU-2027 retry scan RPC if negotiation times out .. KUDU-2027 retry scan RPC if negotiation times out This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client retries a scan RPC with other tablet replica if the connection negotiation with current replica timed out. Added new integration test to cover the updated client's behavior. This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321, since KUDU-2027 is a scan-related counterpart of KUDU-1580. Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 --- M src/kudu/client/scanner-internal.cc M src/kudu/integration-tests/client-negotiation-failover-itest.cc 2 files changed, 99 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/3 -- To view, visit http://gerrit.cloudera.org:8080/7037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] external minicluster: expand EMC dir usage
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6845 to look at the new patch set (#17). Change subject: external minicluster: expand EMC dir usage .. external minicluster: expand EMC dir usage In order to test different disk configurations, it is becoming increasingly important to have end-to-end testing with nodes backed by multiple directories. External miniclusters by default use a single directory for each daemon's data (i.e. wals and data dirs fall under a single /cluster/daemon/data directory). This patch adds multi-directory support via a new 'num_data_dirs' parameter to ExternalMiniClusterOptions. Additionally, a 'wal_dir' parameter is added to ExternalDaemonOptions to separate the wal location from the data directories. If 'num_data_dirs' is greater than 1, each daemon will generate multiple paths, appending each with a numeric suffix, up to the number specified. E.g. EMCs that would have used the path /cluster/data, if specifying 'num_data_dirs' as 3, will now generate multiple data directories /cluster/data-0, /cluster/data-1, /cluster/data-2. The wal will be added to /cluster/wal. The new test multidir-cluster-itest demonstrates this. This test has been run via dist-test 1000 with no flakes. Results here: http://dist-test.cloudera.org/job?job_id=awong.1496426200.3585 Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 --- M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/disk_reservation-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.h M src/kudu/integration-tests/external_mini_cluster.cc M src/kudu/integration-tests/external_mini_cluster.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_migration-itest.cc A src/kudu/integration-tests/multidir-cluster-itest.cc M src/kudu/integration-tests/open-readonly-fs-itest.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_recovery-itest.cc 15 files changed, 242 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/6845/17 -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] external minicluster: expand EMC dir usage
Andrew Wong has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 17: (9 comments) http://gerrit.cloudera.org:8080/#/c/6845/16/src/kudu/fs/fs_manager.h File src/kudu/fs/fs_manager.h: Line 105: ~FsManager(); > why have this instead of just passing in an FsManagerOpts with the appropri Yeah good point, this is only used in one place right now. Done http://gerrit.cloudera.org:8080/#/c/6845/16/src/kudu/integration-tests/external_mini_cluster.h File src/kudu/integration-tests/external_mini_cluster.h: Line 426: > worth DCHECKing here that data_dirs_.size() == 1? seems like it might be ea Done http://gerrit.cloudera.org:8080/#/c/6845/16/src/kudu/integration-tests/multidir-cluster-itest.cc File src/kudu/integration-tests/multidir-cluster-itest.cc: Line 25: #include "kudu/gutil/map-util.h" > nit: alpha order Done PS16, Line 40: > Would you mind using ExternalMiniClusterITestBase for this test? It's what Yeah, Alexey made this point too, I was originally planning on extending this test for disk failure scenarios. Having added more disk failure tests, it probably makes sense to keep disk failure separate. Done Line 45: "--flush_threshold_mb=1", > maybe TestWorkload could do this just as well? Done Line 83: } > You can use TestWorkload.Setup() to easily create a table Done Line 103 > You should be able to get the same effect using TestWorkload.Start() and St Done Line 104 > This is potentially flaky. See below for a suggestion. Done PS16, Line 107: : : : : : : : : : : : : : > You can wrap this in ASSERT_EVENTUALLY() and avoid the potentially-flaky Sl Done -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Todd Lipcon has posted comments on this change. Change subject: KUDU-1952 Remove round-robin for block placement .. Patch Set 44: (4 comments) http://gerrit.cloudera.org:8080/#/c/6636/44/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: PS44, Line 181: // If 'delete_type' is TABLET_DATA_COPYING, a new DataDirGroup will be generated : // for the tablet. this isn't accurate anymore, is it? http://gerrit.cloudera.org:8080/#/c/6636/44/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: Line 247: fs_manager_->dd_manager()->CreateDataDirGroup(tablet_id_); should we be able to CHECK_OK this? Line 265: fs_manager_->dd_manager()->GetDataDirGroupPB(tablet_id_, superblock_->mutable_data_dir_group()); same, we expect this to always succeed so could CHECK right? http://gerrit.cloudera.org:8080/#/c/6636/44/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: PS44, Line 186: // If 'delete_type' is TABLET_DATA_COPYING, this call also creates a new data : // dir group for the existing tablet. : not true anymore? -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 44 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] external minicluster: expand EMC dir usage
Todd Lipcon has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 17: Code-Review+1 (1 comment) just one minor nit http://gerrit.cloudera.org:8080/#/c/6845/17/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: PS17, Line 328: opts.read_only = false; : opts.metric_entity = nullptr; aren't these defaults? -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: Remove Consensus interface
Mike Percy has submitted this change and it was merged. Change subject: consensus: Remove Consensus interface .. consensus: Remove Consensus interface We only have one Consensus implementation now, and have no plans for additional implementations in the future. So we can remove this interface. Change-Id: I21b976cb92619083e1f1766b13634b841b554c8c Reviewed-on: http://gerrit.cloudera.org:8080/7040 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy --- M src/kudu/consensus/CMakeLists.txt M src/kudu/consensus/consensus-test-util.h D src/kudu/consensus/consensus.cc D src/kudu/consensus/consensus.h M src/kudu/consensus/leader_election.h M src/kudu/consensus/pending_rounds.cc M src/kudu/consensus/pending_rounds.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/transactions/transaction.h M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h M src/kudu/tablet/transactions/write_transaction.h M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/tserver-path-handlers.cc 23 files changed, 446 insertions(+), 619 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I21b976cb92619083e1f1766b13634b841b554c8c Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Remove Consensus interface
Mike Percy has posted comments on this change. Change subject: consensus: Remove Consensus interface .. Patch Set 3: Code-Review+2 Rolling forward +1 from Todd and +2 from David after include-ordering fixes. -- To view, visit http://gerrit.cloudera.org:8080/7040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21b976cb92619083e1f1766b13634b841b554c8c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out
Todd Lipcon has posted comments on this change. Change subject: KUDU-2027 retry scan RPC if negotiation times out .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/client/scanner-internal.cc File src/kudu/client/scanner-internal.cc: PS3, Line 423: ScanRpcStatus scan_status = SendScanRpc(deadline, configuration_.is_fault_tolerant()); on the 'Open()' RPC, we should be able to fail-over the scan even if it's not fault-tolerant. ie if we haven't yet returned any rows from the tablet, then it's safe to keep trying to open the scanner on other replicas. http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/integration-tests/client-negotiation-failover-itest.cc File src/kudu/integration-tests/client-negotiation-failover-itest.cc: PS3, Line 152: hopefully hrm, I think you are being pretty optimistic here. If you assume hashing produces a random distribution, then I think there is only a 22% chance that they all end up in different partitions (3!/3^3 = 6/27 = 2/9). Does that affect the correctness of the test? Line 191: ASSERT_OK(ScanToStrings(&scanner, &row_strings)); shouldn't this still be assertable like the above, given we set READ_AT_SNAPSHOT? The replica should block until that snapshot is safe, if it isn't safe yet. Line 196: SCOPED_TRACE(Substitute("snapshot read at closest replica, iteration $0", i)); this should say "read-latest" not "snapshot" right? -- To view, visit http://gerrit.cloudera.org:8080/7037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6636 to look at the new patch set (#45). Change subject: KUDU-1952 Remove round-robin for block placement .. KUDU-1952 Remove round-robin for block placement This is the first of a multi-patch patchset to mitigate the effects of single-disk failure. Throughout the code, the term "DataDir" refers to a data directory, which is often mounted on a distinct disk. Thus, "disks" and "data directories" will be used interchangeably. This patch adds a mapping from tablet to a set of disks and uses it to replace the existing round-robin placement of blocks. Tablets are mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks are placed randomly in directories within each tablet's DataDirGroup. Tablet-to-group mappings are generated and stored as metadata upon tablet creation, or upon tablet replacement during a tablet copy. During group creation, disks are added to groups by randomly selecting two available directories and selecting the one with fewer tablets on it ("Power of Two Random Choices"). This avoids pigeonholing new tablets to disks with relatively few tablets, while still trending towards filling underloaded disks. Groups are maintained when restarting the server, as they are flushed with metadata, and are deleted upon tablet deletion. When loading tablet data from a previous version of Kudu, the tablet's metadata will not have a DataDirGroup. One will be generated containing all data directories, as the tablet's data may already be spread across any number of disks. As this patch only addresses block placement, it does not itself mitigate the effects of single-disk failure. Given this, and given the tradeoff between I/O and disk-failure tolerance, the default behavior will be to spread tablet data across all disks. Testing is done at the block manager level in block_manager-test and log_block_manager-test, as well as in the new data_dirs-test. The implementation of block placement is compared against a python implementation found here: https://gist.github.com/andrwng/7c24e8e26aec68c50741f92eb6f2e48d Sweeping over a few parameters shows nearly identical stddev values of the distribution of tablets across directories between implementations. https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_cpp.png https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_py.png https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_cpp.png https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_py.png https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_cpp.png https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_py.png A design doc can be found here: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b --- M src/kudu/cfile/bloomfile-test-base.h M src/kudu/cfile/cfile-test-base.h M src/kudu/cfile/cfile-test.cc M src/kudu/fs/CMakeLists.txt M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h A src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/delta_compaction-test.cc M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/util/pb_util.proto 37 files changed, 1,051 insertions(+), 200 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/45 -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 45 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: To
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Andrew Wong has posted comments on this change. Change subject: KUDU-1952 Remove round-robin for block placement .. Patch Set 45: (4 comments) http://gerrit.cloudera.org:8080/#/c/6636/44/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: PS44, Line 181: // 'last_logged_opid' should be set to the last opid in the log, if any is known. : // If 'last_logged > this isn't accurate anymore, is it? Done http://gerrit.cloudera.org:8080/#/c/6636/44/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: Line 247: CHECK_OK(fs_manager_->dd_manager()->CreateDataDirGroup(tablet_id_)); > should we be able to CHECK_OK this? Right DeleteTabletData should ensure this in terms of whether the tablet already exists. Could also fail if there's a disk error or if all dirs are full. This will probably change with disk failure-handling support. Line 265: CHECK(fs_manager_->dd_manager()->GetDataDirGroupPB(tablet_id_, > same, we expect this to always succeed so could CHECK right? Same Done http://gerrit.cloudera.org:8080/#/c/6636/44/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: PS44, Line 186: tablet::TabletDataState delete_type, : const boost::optional& last_logged_opid); : > not true anymore? Done -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 45 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] external minicluster: expand EMC dir usage
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6845 to look at the new patch set (#18). Change subject: external minicluster: expand EMC dir usage .. external minicluster: expand EMC dir usage In order to test different disk configurations, it is becoming increasingly important to have end-to-end testing with nodes backed by multiple directories. External miniclusters by default use a single directory for each daemon's data (i.e. wals and data dirs fall under a single /cluster/daemon/data directory). This patch adds multi-directory support via a new 'num_data_dirs' parameter to ExternalMiniClusterOptions. Additionally, a 'wal_dir' parameter is added to ExternalDaemonOptions to separate the wal location from the data directories. If 'num_data_dirs' is greater than 1, each daemon will generate multiple paths, appending each with a numeric suffix, up to the number specified. E.g. EMCs that would have used the path /cluster/data, if specifying 'num_data_dirs' as 3, will now generate multiple data directories /cluster/data-0, /cluster/data-1, /cluster/data-2. The wal will be added to /cluster/wal. The new test multidir-cluster-itest demonstrates this. This test has been run via dist-test 1000 with no flakes. Results here: http://dist-test.cloudera.org/job?job_id=awong.1496426200.3585 Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 --- M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/disk_reservation-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.h M src/kudu/integration-tests/external_mini_cluster.cc M src/kudu/integration-tests/external_mini_cluster.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_migration-itest.cc A src/kudu/integration-tests/multidir-cluster-itest.cc M src/kudu/integration-tests/open-readonly-fs-itest.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_recovery-itest.cc 15 files changed, 239 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/6845/18 -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 18 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] external minicluster: expand EMC dir usage
Andrew Wong has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/6845/17/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: PS17, Line 328: opts.read_only = false; : opts.metric_entity = nullptr; > aren't these defaults? Done -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] external minicluster: expand EMC dir usage
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6845 to look at the new patch set (#19). Change subject: external minicluster: expand EMC dir usage .. external minicluster: expand EMC dir usage In order to test different disk configurations, it is becoming increasingly important to have end-to-end testing with nodes backed by multiple directories. External miniclusters by default use a single directory for each daemon's data (i.e. wals and data dirs fall under a single /cluster/daemon/data directory). This patch adds multi-directory support via a new 'num_data_dirs' parameter to ExternalMiniClusterOptions. Additionally, a 'wal_dir' parameter is added to ExternalDaemonOptions to separate the wal location from the data directories. If 'num_data_dirs' is greater than 1, each daemon will generate multiple paths, appending each with a numeric suffix, up to the number specified. E.g. EMCs that would have used the path /cluster/data, if specifying 'num_data_dirs' as 3, will now generate multiple data directories /cluster/data-0, /cluster/data-1, /cluster/data-2. The wal will be added to /cluster/wal. The new test multidir-cluster-itest demonstrates this. This test has been run via dist-test 1000 with no flakes. Results here: http://dist-test.cloudera.org/job?job_id=awong.1496426200.3585 Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 --- M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/disk_reservation-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.h M src/kudu/integration-tests/external_mini_cluster.cc M src/kudu/integration-tests/external_mini_cluster.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_migration-itest.cc A src/kudu/integration-tests/multidir-cluster-itest.cc M src/kudu/integration-tests/open-readonly-fs-itest.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_recovery-itest.cc 15 files changed, 237 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/6845/19 -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] external minicluster: expand EMC dir usage
Todd Lipcon has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 19: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] external minicluster: expand EMC dir usage
Mike Percy has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 19: (5 comments) I don't think the test should be flaky anymore but did you try a few dist-test loops? http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/disk_reservation-itest.cc File src/kudu/integration-tests/disk_reservation-itest.cc: Line 68: NO_FATALS(StartClusterWithOpts(opts)); Looks like you can use this now: NO_FATALS(StartCluster(ts_flags, {}, /* num_tablet_servers= */ 1, /* num_data_dirs= */ 2)); http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/external_mini_cluster.h File src/kudu/integration-tests/external_mini_cluster.h: Line 422: // Delete files specified by 'wal_dir_' and 'data_dirs_'. add: WARN_UNUSED_RESULT http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/master_failover-itest.cc File src/kudu/integration-tests/master_failover-itest.cc: Line 366: failed_master->DeleteFromDisk(); ASSERT_OK() http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/multidir-cluster-itest.cc File src/kudu/integration-tests/multidir-cluster-itest.cc: Line 49: StartCluster(ts_flags, {}, /* num_tablet_servers */ 1, kNumDataDirs); NO_FATALS() PS19, Line 88: // Check that WALs are being placed in the expected wal directory. : vector wal_files; : ASSERT_OK(inspect_->ListFilesInDir(JoinPathSegments(ts->wal_dir(), "wals"), &wal_files)); : ASSERT_FALSE(wal_files.empty()); Come to think of it, it wouldn't hurt to put this inside the ASSERT_EVENTUALLY() as well. -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add a workaround for LSAN bug #757
Mike Percy has posted comments on this change. Change subject: Add a workaround for LSAN bug #757 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id79e4ed7fa6311bcabdb55b8d4fff9f9a4f242bc Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Add a workaround for LSAN bug #757
Mike Percy has posted comments on this change. Change subject: Add a workaround for LSAN bug #757 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7052/1/src/kudu/server/generic_service.cc File src/kudu/server/generic_service.cc: PS1, Line 136: has_leaks = __lsan_do_recoverable_leak_check(); > lsan_do_recoverable_leak_check already does print the leak to stderr. I sup I would agree w/ David if __lsan_do_recoverable_leak_check() didn't already log (I didn't know that either) but since it does then I think it would add unnecessary spam to the log. This seems fine. -- To view, visit http://gerrit.cloudera.org:8080/7052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id79e4ed7fa6311bcabdb55b8d4fff9f9a4f242bc Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1952 Remove round-robin for block placement
Adar Dembo has posted comments on this change. Change subject: KUDU-1952 Remove round-robin for block placement .. Patch Set 45: Code-Review+2 Verified+1 Known flake, let's override Jenkins. -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 45 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] external minicluster: expand EMC dir usage
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6845 to look at the new patch set (#20). Change subject: external minicluster: expand EMC dir usage .. external minicluster: expand EMC dir usage In order to test different disk configurations, it is becoming increasingly important to have end-to-end testing with nodes backed by multiple directories. External miniclusters by default use a single directory for each daemon's data (i.e. wals and data dirs fall under a single /cluster/daemon/data directory). This patch adds multi-directory support via a new 'num_data_dirs' parameter to ExternalMiniClusterOptions. Additionally, a 'wal_dir' parameter is added to ExternalDaemonOptions to separate the wal location from the data directories. If 'num_data_dirs' is greater than 1, each daemon will generate multiple paths, appending each with a numeric suffix, up to the number specified. E.g. EMCs that would have used the path /cluster/data, if specifying 'num_data_dirs' as 3, will now generate multiple data directories /cluster/data-0, /cluster/data-1, /cluster/data-2. The wal will be added to /cluster/wal. The new test multidir-cluster-itest demonstrates this. This test has been run via dist-test 1000 with no flakes. Results here: http://dist-test.cloudera.org/job?job_id=awong.1496426200.3585 Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 --- M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/disk_reservation-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.h M src/kudu/integration-tests/external_mini_cluster.cc M src/kudu/integration-tests/external_mini_cluster.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_migration-itest.cc A src/kudu/integration-tests/multidir-cluster-itest.cc M src/kudu/integration-tests/open-readonly-fs-itest.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_recovery-itest.cc 15 files changed, 234 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/6845/20 -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 20 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] external minicluster: expand EMC dir usage
Andrew Wong has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 17: (5 comments) Yep, added the results to the commit message http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/disk_reservation-itest.cc File src/kudu/integration-tests/disk_reservation-itest.cc: Line 68: NO_FATALS(StartClusterWithOpts(opts)); > Looks like you can use this now: Done http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/external_mini_cluster.h File src/kudu/integration-tests/external_mini_cluster.h: Line 422: // Delete files specified by 'wal_dir_' and 'data_dirs_'. > add: WARN_UNUSED_RESULT Done http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/master_failover-itest.cc File src/kudu/integration-tests/master_failover-itest.cc: Line 366: failed_master->DeleteFromDisk(); > ASSERT_OK() Done http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/multidir-cluster-itest.cc File src/kudu/integration-tests/multidir-cluster-itest.cc: Line 49: "--fs_target_data_dirs_per_tablet=0" > NO_FATALS() Done PS19, Line 88: }); : work.StopAndJoin(); : : // Check that WALs are being pla > Come to think of it, it wouldn't hurt to put this inside the ASSERT_EVENTUA Hrm, I'm trying to think of whether we can have a written blocks without having written to wal. I don't think so, but agreed, wouldn't hurt -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6968 to look at the new patch set (#7). Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. KUDU-1755 Part 2: Improve tablet on-disk size metric This adds WAL segment and cmeta sizes to the tablet on-disk metric. It also fixes the tablet superblock on disk size, which was being underestimated because the size was measured as the PB size instead of counting the container file the PB is stored in. It also exposes a second metric, Tablet Data Size On Disk, which measures just the size of data blocks and does not include metadata or WAL segments. Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/consensus/consensus_meta.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.cc M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_metadata-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h 22 files changed, 230 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/7 -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] external minicluster: expand EMC dir usage
Adar Dembo has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 19: (6 comments) http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/CMakeLists.txt File src/kudu/integration-tests/CMakeLists.txt: Line 78: ADD_KUDU_TEST(multidir-cluster-itest) Nit: should come after master-stress-test, and should probably be called multidir_cluster-itest (dashed suffixes are reserved for kinds of tests). http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/external_mini_cluster.cc File src/kudu/integration-tests/external_mini_cluster.cc: Line 255: CHECK_LE(*dir_index, opts_.num_data_dirs); Shouldn't this be CHECK_LT? If num_data_dirs=3, the only allowable values for dir_index should be 0, 1, and 2, right? Line 269: paths.push_back(GetDataPath(daemon_id, dir_index)); Nit: use emplace_back() here. http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/external_mini_cluster.h File src/kudu/integration-tests/external_mini_cluster.h: Line 21: #include Nit: this belongs in a separate group after the STL includes, because boost is part of the "Kudu project". Line 300: boost::optional dir_index = boost::none) const; Any particular reason we're using uint32_t for this stuff and not 'int'? Line 428: DCHECK_EQ(1, data_dirs_.size()); Nit: Should probably be CHECK_EQ() given that the other checks you've added are not debug-only. -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Will Berkeley has posted comments on this change. Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/6968/6/src/kudu/consensus/consensus_meta.cc File src/kudu/consensus/consensus_meta.cc: Line 220: on_disk_size_.store(pb_.ByteSize(), std::memory_order_relaxed); > yea, I wonder whether it really even is worth tracking this. cmetas are pre I think it's also the only item from 1755 that is essentially fixed size and not proportional to some measure of usage or activity (WAL, UNDOs) or size of data (data, superblock). Mike, would you be ok leaving it out? http://gerrit.cloudera.org:8080/#/c/6968/6/src/kudu/consensus/consensus_meta.h File src/kudu/consensus/consensus_meta.h: Line 182: std::atomic on_disk_size_; > This class isn't thread-safe so there is no reason to use an atomic here Done http://gerrit.cloudera.org:8080/#/c/6968/6/src/kudu/tablet/diskrowset.h File src/kudu/tablet/diskrowset.h: Line 338: uint64_t OnDiskSize() const OVERRIDE; > Mind adding a comment here to differentiate from OnDiskDataSize()? Done http://gerrit.cloudera.org:8080/#/c/6968/6/src/kudu/tablet/rowset.h File src/kudu/tablet/rowset.h: Line 115: virtual Status DebugDump(vector *lines = NULL) = 0; > warning: default arguments on virtual or override methods are prohibited [g Ignored Line 338: // Return the total size on-disk of this rowset's data, in bytes. > Maybe add (excluding metadata) Done http://gerrit.cloudera.org:8080/#/c/6968/6/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: Line 187: tablet_->GetMetricEntity(), Bind(&TabletReplica::OnDiskSize, Unretained(this))) > In general I don't like metrics that acquire lots of locks on demand since Todd, do you have an idea of how costly gathering these metrics will be? Do you think it'd be worth profiling to see how expensive they are? PS6, Line 360: size_t cmeta_size = consensus_ ? consensus_->MetadataOnDiskSize() : 0; : std::lock_guard lock(lock_); : DCHECK(status_pb_out != nullptr); : status_pb_out->set_tablet_id(meta_->tablet_id()); : status_pb_out->set_table_name(meta_->table_name()); : status_pb_out->set_last_status(last_status_); : meta_->partition().ToPB(status_pb_out->mutable_partition()); : status_pb_out->set_state(state_); : status_pb_out->set_tablet_data_state(meta_->tablet_data_state()); : status_pb_out->set_estimated_on_disk_size(cmeta_size + OnDiskSizeUnlocked()); > Since meta_ is const we can do less under the lock: Done -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] disk failure: make DataDirManager failure-aware
Adar Dembo has posted comments on this change. Change subject: disk failure: make DataDirManager failure-aware .. Patch Set 1: (13 comments) http://gerrit.cloudera.org:8080/#/c/7028/1/src/kudu/fs/data_dirs-test.cc File src/kudu/fs/data_dirs-test.cc: Line 86: CreateBlockOptions non_existent_tablet_opts(test_block_opts_); Why do we need to copy test_block_opts_? Can't we just pass it directly to GetNextDataDir()? http://gerrit.cloudera.org:8080/#/c/7028/1/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: Line 443: shared_lock health_lock(dir_health_lock_.get_lock()); This is why I asked about the contention profiles of the locks: use and acquisition of multiple locks should only be done if absolutely necessary, because it's really easy to make a mistake and cause a deadlock via lock inversion. PS1, Line 460: group_target_size = std::min(group_target_size, : static_cast(data_dirs_.size() - failed_data_dirs_.size())); If someone requested a group of a particular size, why not fail that request if there have been too many failures? Why be permissive? PS1, Line 484: // This should only be reached by some tests; in cases where there is no : // natural tablet_id, select a data dir from any of the directories. So in this case there's no point in considering failed_data_dirs_? Line 503: return Status::IOError("No healthy directories exist in tablet's " No ENODEV here? Line 587: bool DataDirManager::FindTabletsByDataDirUuidIdx(uint16_t uuid_idx, set** tablet_ids) { This needs some locking. Line 601: LOG(ERROR) << Substitute("Dir with uuid index $0 ($1) marked as failed", uuid_idx, dd->dir()); Perhaps avoid logging this multiple times if failed_data_dirs_ already has this uuid_idx? http://gerrit.cloudera.org:8080/#/c/7028/1/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: Line 263: bool FindTabletsByDataDirUuidIdx(uint16_t uuid_idx, std::set** I imagine you're trying to avoid copies by passing the caller a pointer into the map, but this makes synchronization and lifecycle management much harder. To simplify, just have tablet_ids be a pointer and copy the set's contents. Also, I don't understand the difference between returning false and returning true with an empty set. It seems to me that uuid_idx is guaranteed to be in the map; only the contents of tablet_ids can vary (from empty to non-empty). So you could return the set directly instead of passing it as an argument. PS1, Line 268: MarkDataDirFailure Nit: how about MarkDataDirFailed, for better symmetry with IsDirectoryFailed? Line 271: bool IsDirectoryFailed(uint16_t uuid_idx) const; Nit: Likewise, IsDataDirFailed (or HasDataDirFailed) Line 273: void GetFailedDataDirs(std::set* data_dirs) const { How about just returning the data_dirs? Easier for callers. Line 274: *data_dirs = failed_data_dirs_; This needs to acquire a lock, no? Line 329: // Lock protecting changes to failed_data_dirs_. Why do we need a different lock for this? Do you expect either this lock or dir_group_lock_ to be highly contended? -- To view, visit http://gerrit.cloudera.org:8080/7028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iee212793152de5de5198751d649ab34fb97f6aa2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add fault injection of EIOs
Adar Dembo has posted comments on this change. Change subject: Add fault injection of EIOs .. Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/6881/8/src/kudu/util/env-test.cc File src/kudu/util/env-test.cc: Line 1009: FLAGS_env_bad_dir_regex = Substitute("($0)", kTestRWPath1); Why do we need to surround the pathname with parens? Line 1016: FLAGS_env_bad_dir_regex = "(neither_path)"; And here too? http://gerrit.cloudera.org:8080/#/c/6881/8/src/kudu/util/env_posix.cc File src/kudu/util/env_posix.cc: Line 107: #define MAYBE_INJECT_EIO(filename_expr, error_expr) do { \ Can we call this MAYBE_RETURN_EIO to emphasize the similarity with MAYBE_RETURN_FAILURE? Line 150: DEFINE_double(env_inject_eio, 0.0, Why can't we reuse env_inject_io_error? I think it's reasonable for env_inject_io_error to always add EIO posix codes, if that would help. Line 153: DEFINE_string(env_bad_dir_regex, "", Should rename to emphasize symmetry with env_inject_eio/env_inject_io_error (i.e. "env_inject_eio_dir_regex" or some such). Line 154: "ECMAScript regex specifying bad directories. If empty, all " Nit: Instead of "bad" just explain how this ties into error injection. Line 156: TAG_FLAG(env_inject_eio, hidden); Nit: it's easy to tag the wrong flags, so let's colocate the flag tags and definitions (i.e. don't have the definitions first and tags next). http://gerrit.cloudera.org:8080/#/c/6881/8/src/kudu/util/fault_injection.h File src/kudu/util/fault_injection.h: Line 46: // Note that 'status_expr' is evaluated even if 'fraction_flag' is zero. Can this be addressed? Seems like you could use MaybeTrue() on fraction_flag now, before expanding status_eval? PS8, Line 51: desribed described -- To view, visit http://gerrit.cloudera.org:8080/6881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5190d5e7f296cf27b65f1ecd2c16f3315cc46a39 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] external minicluster: expand EMC dir usage
Mike Percy has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/multidir-cluster-itest.cc File src/kudu/integration-tests/multidir-cluster-itest.cc: PS19, Line 88: // Check that WALs are being placed in the expected wal directory. : vector wal_files; : ASSERT_OK(inspect_->ListFilesInDir(JoinPathSegments(ts->wal_dir(), "wals"), &wal_files)); : ASSERT_FALSE(wal_files.empty()); > Hrm, I'm trying to think of whether we can have a written blocks without ha Yeah you are right, it shouldn't be possible. -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] external minicluster: expand EMC dir usage
Mike Percy has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/external_mini_cluster.h File src/kudu/integration-tests/external_mini_cluster.h: Line 21: #include > Nit: this belongs in a separate group after the STL includes, because boost I don't think we've standardized on that and I don't really care what we decide on but it would be nice to agree on this (admittedly minor) point. I'm fine with doing it the way you suggest. -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] tool: add a 'local-replica cmeta set-term' tool
Adar Dembo has posted comments on this change. Change subject: tool: add a 'local-replica cmeta set-term' tool .. Patch Set 1: (4 comments) > curious whether you think this is still useful considering the > 'kudu pbc edit' that I also have up for review. > > This one is a little more constrained, which may be useful, but > maybe not worth it vs just having the editor? I definitely think this is useful, despite a more free-form editing tool. http://gerrit.cloudera.org:8080/#/c/7049/1/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: Line 1486: ASSERT_OK(mini_cluster_->WaitForTabletServerCount(1)); This is guaranteed by MiniCluster::Start(). http://gerrit.cloudera.org:8080/#/c/7049/1/src/kudu/tools/tool_action_local_replica.cc File src/kudu/tools/tool_action_local_replica.cc: Line 310: cmeta->clear_voted_for(); Why is it important to do this? I can take my answer in the form of a code comment. Line 798: .Description("Bump the current term stored in consensus metadata") Maybe add that the new term must be greater than the old one? Either here or in ExtraDescription. PS1, Line 800: "term" Since this is used in two places, our convention has been to hide it behind a kBlahArg constant. -- To view, visit http://gerrit.cloudera.org:8080/7049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7525ffbe772f214e0972a6b450f3f1609109ca05 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] external minicluster: expand EMC dir usage
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6845 to look at the new patch set (#21). Change subject: external minicluster: expand EMC dir usage .. external minicluster: expand EMC dir usage In order to test different disk configurations, it is becoming increasingly important to have end-to-end testing with nodes backed by multiple directories. External miniclusters by default use a single directory for each daemon's data (i.e. wals and data dirs fall under a single /cluster/daemon/data directory). This patch adds multi-directory support via a new 'num_data_dirs' parameter to ExternalMiniClusterOptions. Additionally, a 'wal_dir' parameter is added to ExternalDaemonOptions to separate the wal location from the data directories. If 'num_data_dirs' is greater than 1, each daemon will generate multiple paths, appending each with a numeric suffix, up to the number specified. E.g. EMCs that would have used the path /cluster/data, if specifying 'num_data_dirs' as 3, will now generate multiple data directories /cluster/data-0, /cluster/data-1, /cluster/data-2. The wal will be added to /cluster/wal. The new test multidir-cluster-itest demonstrates this. This test has been run via dist-test 1000 with no flakes. Results here: http://dist-test.cloudera.org/job?job_id=awong.1496426200.3585 Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 --- M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/disk_reservation-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.h M src/kudu/integration-tests/external_mini_cluster.cc M src/kudu/integration-tests/external_mini_cluster.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_migration-itest.cc A src/kudu/integration-tests/multidir_cluster-itest.cc M src/kudu/integration-tests/open-readonly-fs-itest.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_recovery-itest.cc 15 files changed, 235 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/6845/21 -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6968 to look at the new patch set (#8). Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. KUDU-1755 Part 2: Improve tablet on-disk size metric This adds WAL segment and cmeta sizes to the tablet on-disk metric. It also fixes the tablet superblock on disk size, which was being underestimated because the size was measured as the PB size instead of counting the container file the PB is stored in. It also exposes a second metric, Tablet Data Size On Disk, which measures just the size of data blocks and does not include metadata or WAL segments. Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/consensus/consensus_meta.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.cc M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_metadata-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h 22 files changed, 230 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/8 -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] tool: add a 'pbc edit' command
Adar Dembo has posted comments on this change. Change subject: tool: add a 'pbc edit' command .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/7048/1/src/kudu/tools/tool_action_pbc.cc File src/kudu/tools/tool_action_pbc.cc: PS1, Line 142: "pbc-edit.XX" > If we name it "pbc-edit.XX.json" then vim / emacs will give us syntax h We should also include the standard Kudu tmpdir infix so that if you leave one behind in a data directory it'll get cleaned up. Line 173: for (const string& l : lines) { > one per line can be hard to edit for larger messages. Would it be difficult Agreed. How does protobuf write out the JSON-encoded messages? I presume it's a top-level JSON array of messages, so perhaps we could parse with rapidjson, iterate over the top-level array, serialize each record _back_ to JSON, then hand that off to JsonStringToMessage()? It's excessive, but it should allow for more flexibility in the editing process. Line 194: WARN_NOT_OK(env->SyncDir(dir), "couldn't sync directory"); If you're going to sync the directory you should also sync pb_writer before you close it. http://gerrit.cloudera.org:8080/#/c/7048/1/src/kudu/util/pb_util-test.cc File src/kudu/util/pb_util-test.cc: Line 557: const char* kExpectedOutputJson = Is this expected output stable? Can we ensure that it's stable by asking the protobuf JSON parser thing to sort the keys or something? -- To view, visit http://gerrit.cloudera.org:8080/7048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie9434935469df8978a4f3fb3719f0245499aece5 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] external minicluster: expand EMC dir usage
Adar Dembo has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/external_mini_cluster.h File src/kudu/integration-tests/external_mini_cluster.h: Line 21: #include > I don't think we've standardized on that and I don't really care what we de https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes Seems pretty clear-cut to me: Boost is "Other libraries' .h files", not "C++ system files". I view the fact our thirdparty dependencies are placed on the "system" include path (and thus use triangular brackets instead of double quotes) as an implementation detail. -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] external minicluster: expand EMC dir usage
Andrew Wong has posted comments on this change. Change subject: external minicluster: expand EMC dir usage .. Patch Set 19: (7 comments) Retriggered after rebasing to master for tsan build. http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/CMakeLists.txt File src/kudu/integration-tests/CMakeLists.txt: Line 78: ADD_KUDU_TEST(multidir-cluster-itest) > Nit: should come after master-stress-test, and should probably be called mu Done http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/external_mini_cluster.cc File src/kudu/integration-tests/external_mini_cluster.cc: Line 255: CHECK_LE(*dir_index, opts_.num_data_dirs); > Shouldn't this be CHECK_LT? If num_data_dirs=3, the only allowable values f Done Line 269: paths.push_back(GetDataPath(daemon_id, dir_index)); > Nit: use emplace_back() here. Done http://gerrit.cloudera.org:8080/#/c/6845/19/src/kudu/integration-tests/external_mini_cluster.h File src/kudu/integration-tests/external_mini_cluster.h: Line 21: #include > https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includ Yeah thinking about it wrt system-owned vs project-owned makes sense Line 21: #include > Nit: this belongs in a separate group after the STL includes, because boost Done Line 300: boost::optional dir_index = boost::none) const; > Any particular reason we're using uint32_t for this stuff and not 'int'? Mainly because I'd like to keep these indices non-negative. Could accomplish this with some checks, and that would allow us to avoid boost altogether, but using uint32_t makes it more explicit. Line 428: DCHECK_EQ(1, data_dirs_.size()); > Nit: Should probably be CHECK_EQ() given that the other checks you've added Done -- To view, visit http://gerrit.cloudera.org:8080/6845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id2f5def6980ad394c8558ad97ba830f1b0257332 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add fault injection of EIOs
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6881 to look at the new patch set (#11). Change subject: Add fault injection of EIOs .. Add fault injection of EIOs This patch adds the functionality to inject EIOs in env_posix based on a flag-specified ECMAScript regex determining which paths should fail and a flag-specified double indicating how often these failures should occur. A test is added to env-test.cc to verify that EIOs are successfully triggered by specifying regexes. Change-Id: I5190d5e7f296cf27b65f1ecd2c16f3315cc46a39 --- M src/kudu/util/env-test.cc M src/kudu/util/env_posix.cc M src/kudu/util/fault_injection.cc M src/kudu/util/fault_injection.h 4 files changed, 132 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/6881/11 -- To view, visit http://gerrit.cloudera.org:8080/6881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5190d5e7f296cf27b65f1ecd2c16f3315cc46a39 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Add fault injection of EIOs
Andrew Wong has posted comments on this change. Change subject: Add fault injection of EIOs .. Patch Set 11: (7 comments) Haven't addressed everything, marked some "will do"s. Pushed after rebasing as along with the EMC patch http://gerrit.cloudera.org:8080/#/c/6881/8/src/kudu/util/env_posix.cc File src/kudu/util/env_posix.cc: Line 107: const string& f_ = (filename_expr); \ > Can we call this MAYBE_RETURN_EIO to emphasize the similarity with MAYBE_RE Done Line 150: "Fraction of the time that operations on certain files will fail " > Why can't we reuse env_inject_io_error? I think it's reasonable for env_inj We can, although there's the subtlety that suicide_on_eio should be switched to false in tests that use the current env_inject_io_error. Will try it out. Line 153: DEFINE_string(env_inject_eio_regex, "", > Should rename to emphasize symmetry with env_inject_eio/env_inject_io_error Done Line 154: "ECMAScript regex specifying files on which I/O will fail. If " > Nit: Instead of "bad" just explain how this ties into error injection. Done Line 156: TAG_FLAG(env_inject_eio_regex, hidden); > Nit: it's easy to tag the wrong flags, so let's colocate the flag tags and Done http://gerrit.cloudera.org:8080/#/c/6881/8/src/kudu/util/fault_injection.h File src/kudu/util/fault_injection.h: Line 46: // Note that 'status_expr' is evaluated even if 'fraction_flag' is zero. > Can this be addressed? Seems like you could use MaybeTrue() on fraction_fla Yeah that's MAYBE_EVAL_AND_RETURN_FAILURE. Will replace the implementation since it seems like MAYBE_EVAL_AND_RETURN_FAILURE's behavior is expected. PS8, Line 51: describe > described Done -- To view, visit http://gerrit.cloudera.org:8080/6881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5190d5e7f296cf27b65f1ecd2c16f3315cc46a39 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Bump LLVM to 4.0.0
Hello Adar Dembo, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7069 to review the following change. Change subject: Bump LLVM to 4.0.0 .. Bump LLVM to 4.0.0 Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 --- M thirdparty/download-thirdparty.sh A thirdparty/package-llvm.sh D thirdparty/patches/libc++-fix-std-once-barriers.patch M thirdparty/vars.sh 4 files changed, 25 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/7069/1 -- To view, visit http://gerrit.cloudera.org:8080/7069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Bump LLVM to 4.0.0
Adar Dembo has posted comments on this change. Change subject: Bump LLVM to 4.0.0 .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/7069/1//COMMIT_MSG Commit Message: Line 7: Bump LLVM to 4.0.0 Can you explain why you want to do this? I mean, it's generally a reasonable thing to do, but would be good to explain how this fixes the Thrift TSAN build (and link to the corresponding tsan patch, etc.) http://gerrit.cloudera.org:8080/#/c/7069/1/thirdparty/package-llvm.sh File thirdparty/package-llvm.sh: Line 1: #!/bin/bash Add copyright notice and a brief header explaining what this does. Line 5: VERSION=4.0.0 Make this settable via variable. Something like: VERSION=${VERSION:4.0.0} (apologies if that's not the right syntax). Maybe even omit the default since it's going to be wrong for the next upgrade anyway. Line 8: do Nit: move this to the previous line and separate with a semicolon; I think that's the bash style we tend to use elsewhere. http://gerrit.cloudera.org:8080/#/c/7069/1/thirdparty/vars.sh File thirdparty/vars.sh: PS1, Line 134: # Summary: : # 1. Unpack the llvm tarball : # 2. Unpack the clang tarball as tools/clang (rename from cfe- to clang) : # 3. Unpack the extra clang tools tarball as tools/clang/tools/extra : # 4. Unpack the lld tarball as tools/lld : # 5. Unpack the compiler-rt tarball as projects/compiler-rt : # 6. Unpack the libc++ tarball as projects/libcxx : # 7. Unpack the libc++abi tarball as projects/libcxxabi : # 8. Create new tarball from the resulting source tree Move this description to package-llvm.sh. -- To view, visit http://gerrit.cloudera.org:8080/7069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] WIP: Add basic Hive MetaStore client
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7053 to look at the new patch set (#2). Change subject: WIP: Add basic Hive MetaStore client .. WIP: Add basic Hive MetaStore client This is a work-in-progress patch to create a basic HiveMetastore client. - Thrift has been added as a dependency, and a mechanism for performing Thrift codegen at compile time has been added (see FindThrift.cmake, based on FindProtobuf.cmake) - Hive and Hadoop have been added to thirdparty as test-only dependency. - A Hive MetaStore external mini server is included for testing. See mini_hms.[h|cc]. WIP because it's not yet expected to work with dist-test (Hive and Hadoop artifacts need to be distributed). Change-Id: I155223da912bc18a759df2f1f6bc25d1132a99ee --- M CMakeLists.txt A cmake_modules/FindJavaHome.cmake A cmake_modules/FindThrift.cmake A src/kudu/hms/CMakeLists.txt A src/kudu/hms/hive_metastore-test.cc A src/kudu/hms/hive_metastore.cc A src/kudu/hms/hive_metastore.h A src/kudu/hms/hive_metastore.thrift A src/kudu/hms/mini_hms-test.cc A src/kudu/hms/mini_hms.cc A src/kudu/hms/mini_hms.h M src/kudu/security/test/mini_kdc.cc M src/kudu/security/test/mini_kdc.h M src/kudu/util/subprocess.cc M src/kudu/util/subprocess.h M src/kudu/util/test_util.cc M thirdparty/build-definitions.sh M thirdparty/build-thirdparty.sh M thirdparty/download-thirdparty.sh M thirdparty/vars.sh 20 files changed, 2,158 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/7053/2 -- To view, visit http://gerrit.cloudera.org:8080/7053 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I155223da912bc18a759df2f1f6bc25d1132a99ee Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Bump LLVM to 4.0.0
Dan Burkert has uploaded a new patch set (#2). Change subject: Bump LLVM to 4.0.0 .. Bump LLVM to 4.0.0 I'm working on integrating Thrift into the Kudu build, and hit https://reviews.llvm.org/D22800, which is fixed in 4.0.0. Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 --- M thirdparty/download-thirdparty.sh A thirdparty/package-llvm.sh D thirdparty/patches/libc++-fix-std-once-barriers.patch M thirdparty/vars.sh 4 files changed, 57 insertions(+), 105 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/7069/2 -- To view, visit http://gerrit.cloudera.org:8080/7069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Bump LLVM to 4.0.0
Dan Burkert has posted comments on this change. Change subject: Bump LLVM to 4.0.0 .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/7069/1//COMMIT_MSG Commit Message: Line 7: Bump LLVM to 4.0.0 > Can you explain why you want to do this? I mean, it's generally a reasonabl Done http://gerrit.cloudera.org:8080/#/c/7069/1/thirdparty/package-llvm.sh File thirdparty/package-llvm.sh: Line 1: #!/bin/bash > Add copyright notice and a brief header explaining what this does. Done Line 5: VERSION=4.0.0 > Make this settable via variable. Something like: Done Line 8: do > Nit: move this to the previous line and separate with a semicolon; I think Done http://gerrit.cloudera.org:8080/#/c/7069/1/thirdparty/vars.sh File thirdparty/vars.sh: PS1, Line 134: # Summary: : # 1. Unpack the llvm tarball : # 2. Unpack the clang tarball as tools/clang (rename from cfe- to clang) : # 3. Unpack the extra clang tools tarball as tools/clang/tools/extra : # 4. Unpack the lld tarball as tools/lld : # 5. Unpack the compiler-rt tarball as projects/compiler-rt : # 6. Unpack the libc++ tarball as projects/libcxx : # 7. Unpack the libc++abi tarball as projects/libcxxabi : # 8. Create new tarball from the resulting source tree > Move this description to package-llvm.sh. Done -- To view, visit http://gerrit.cloudera.org:8080/7069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](branch-1.3.x) KUDU-2020: tserver failure causes multiple tablet copy operations per under-replicated tablet
Dan Burkert has posted comments on this change. Change subject: KUDU-2020: tserver failure causes multiple tablet copy operations per under-replicated tablet .. Patch Set 1: The protobuf plugin repo is down, so builds are failing permanently. We fixed this by bumping to a new protobuf plugin, I'll see if that's backportable. -- To view, visit http://gerrit.cloudera.org:8080/7057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iffa1f0fec4e882beabfee6e0f2672096caccdf75 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Bump LLVM to 4.0.0
Adar Dembo has posted comments on this change. Change subject: Bump LLVM to 4.0.0 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7069/2/thirdparty/vars.sh File thirdparty/vars.sh: Line 128: # Our llvm tarball includes clang, extra clang tools, lld, and compiler-rt. No need to duplicate this list (which may change over time), so instead, replace this with a pointer to package-llvm.sh. -- To view, visit http://gerrit.cloudera.org:8080/7069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Bump LLVM to 4.0.0
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7069 to look at the new patch set (#3). Change subject: Bump LLVM to 4.0.0 .. Bump LLVM to 4.0.0 I'm working on integrating Thrift into the Kudu build, and hit https://reviews.llvm.org/D22800, which is fixed in 4.0.0. Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 --- M thirdparty/download-thirdparty.sh A thirdparty/package-llvm.sh D thirdparty/patches/libc++-fix-std-once-barriers.patch M thirdparty/vars.sh 4 files changed, 58 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/7069/3 -- To view, visit http://gerrit.cloudera.org:8080/7069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Bump LLVM to 4.0.0
Dan Burkert has posted comments on this change. Change subject: Bump LLVM to 4.0.0 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7069/2/thirdparty/vars.sh File thirdparty/vars.sh: Line 128: # Our llvm tarball includes clang, extra clang tools, lld, and compiler-rt. > No need to duplicate this list (which may change over time), so instead, re Done -- To view, visit http://gerrit.cloudera.org:8080/7069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7037 to look at the new patch set (#4). Change subject: KUDU-2027 retry scan RPC if negotiation times out .. KUDU-2027 retry scan RPC if negotiation times out This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client retries a scan RPC with other tablet replica if the connection negotiation with current replica timed out. Also addressed a TODO in KuduScanner::Data::OpenTablet(). Added new integration test to cover the updated client's behavior. This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321, since KUDU-2027 is a scan-related counterpart of KUDU-1580. Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 --- M src/kudu/client/client.cc M src/kudu/client/scanner-internal.cc M src/kudu/client/scanner-internal.h M src/kudu/integration-tests/client-negotiation-failover-itest.cc 4 files changed, 168 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/4 -- To view, visit http://gerrit.cloudera.org:8080/7037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out
Alexey Serbin has posted comments on this change. Change subject: KUDU-2027 retry scan RPC if negotiation times out .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/client/scanner-internal.cc File src/kudu/client/scanner-internal.cc: PS3, Line 423: CHECK(ts->proxy()); > on the 'Open()' RPC, we should be able to fail-over the scan even if it's n Yep, I had some doubts about this change (too restrictive, but safe). Thank you for the clarification. I'll update this and other places correspondingly. http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/integration-tests/client-negotiation-failover-itest.cc File src/kudu/integration-tests/client-negotiation-failover-itest.cc: PS3, Line 152: hopefully > hrm, I think you are being pretty optimistic here. If you assume hashing pr Well, I assumed the fact that the numbers are sequential adds some bias to the distribution of the results across partitions. >From the other side, during the test, the leader tablets tend to switch from >one server to another due to re-elections. If they flock to the same tablet >server, that get us into the situation where even all operations are put into >different partitions, the behaviour of the test is the same as if all >operations are put into the same tablet. If all the values are put into the same partition (i.e. the same tablet) and that particular tablet is not paused, the test does not exercise the expected negotiation timeout behavior for that particular run. So, assuming there is an issue with the negotiation timeouts, the test would give false negative in that case. However, the test does not produce false positives (I assume by 'correctness' you meant false positives). Since the test runs multiple iterations and it's run multiple times itself, I feel confident it gives us the necessary coverage. Of course, it's possible to do add some code to make sure the writes go to different partitions and do tablet 'un-herding', but I think it's better to keep the code simpler and rely on high number or runs. Line 191: ASSERT_OK(ScanToStrings(&scanner, &row_strings)); > shouldn't this still be assertable like the above, given we set READ_AT_SNA In the first revision it was exactly like that (I assumed it should be like that). However, I found that it fails sometimes: i.e. CLOSEST_REPLICA and READ_AT_SNAPSHOT does not guarantee RYW behaviour. It might be a bug -- I asked David about that on the HipChat Kudu-dev channel, but I haven't gotten response. I'll ping him regarding this. Line 196: SCOPED_TRACE(Substitute("read-latest at closest replica, iteration $0", i)); > this should say "read-latest" not "snapshot" right? good catch -- yes, it should. -- To view, visit http://gerrit.cloudera.org:8080/7037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Bump LLVM to 4.0.0
Adar Dembo has posted comments on this change. Change subject: Bump LLVM to 4.0.0 .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR](branch-1.3.x) [java-client] update protoc maven plugin
Hello David Ribeiro Alves, Grant Henke, Kudu Jenkins, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7070 to review the following change. Change subject: [java-client] update protoc maven plugin .. [java-client] update protoc maven plugin The current protoc maven plugin is not hosted on maven central and the hosting domain is unavailable. It also has not been updated since 2012. It appears like the most current and maintained fork is here: https://github.com/xolstice/protobuf-maven-plugin Change-Id: I43167c298e5230cd6421fd992f712513f8801b5a Reviewed-on: http://gerrit.cloudera.org:8080/6830 Tested-by: Kudu Jenkins Reviewed-by: Dan Burkert Reviewed-by: David Ribeiro Alves --- M java/kudu-client/pom.xml M java/pom.xml 2 files changed, 12 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7070/1 -- To view, visit http://gerrit.cloudera.org:8080/7070 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I43167c298e5230cd6421fd992f712513f8801b5a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] Bump LLVM to 4.0.0
Dan Burkert has submitted this change and it was merged. Change subject: Bump LLVM to 4.0.0 .. Bump LLVM to 4.0.0 I'm working on integrating Thrift into the Kudu build, and hit https://reviews.llvm.org/D22800, which is fixed in 4.0.0. Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 Reviewed-on: http://gerrit.cloudera.org:8080/7069 Reviewed-by: Adar Dembo Tested-by: Kudu Jenkins --- M thirdparty/download-thirdparty.sh A thirdparty/package-llvm.sh D thirdparty/patches/libc++-fix-std-once-barriers.patch M thirdparty/vars.sh 4 files changed, 58 insertions(+), 106 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I64f98bb56d2510ba03ef5ea477a4d7bb4b48c555 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2027 retry scan RPC if negotiation times out
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7037 to look at the new patch set (#5). Change subject: KUDU-2027 retry scan RPC if negotiation times out .. KUDU-2027 retry scan RPC if negotiation times out This patch addresses KUDU-2027, i.e. with this patch the Kudu C++ client retries a scan RPC with other tablet replica if the connection negotiation with current replica timed out. Also addressed a TODO in KuduScanner::Data::OpenTablet(). As a part of the fix for KUDU-2027, I updated the logic of the timeout handling. Now, regardless of whether it was the RPC or the overall operation timeout, the corresponding tablet server is marked as failed and blacklisted. Added new integration test to cover the updated client's behavior. This patch is a follow-up for e3c5dd18c22b9e20358f05dcd301e7736c0a3321, since KUDU-2027 is a scan-related counterpart of KUDU-1580. Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 --- M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/scanner-internal.cc M src/kudu/client/scanner-internal.h M src/kudu/integration-tests/client-negotiation-failover-itest.cc 5 files changed, 235 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/7037/5 -- To view, visit http://gerrit.cloudera.org:8080/7037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon