David Ribeiro Alves has posted comments on this change. Change subject: Add integration tests for duplicate keys ......................................................................
Patch Set 7: (10 comments) http://gerrit.cloudera.org:8080/#/c/5349/7/src/kudu/integration-tests/cluster_verifier.cc File src/kudu/integration-tests/cluster_verifier.cc: PS7, Line 126: CHECK_OK(scanner.SetReadMode(client::KuduScanner::READ_AT_SNAPSHOT)); : CHECK_OK(scanner.SetFaultTolerant()); : CHECK_OK(scanner.SetTimeoutMillis(5000)); : CHECK_OK(scanner.SetProjectedColumns(vector<string>())); > Is it so crucial to abort if something goes non-OK among those? Looking at We were already using this pattern. We RETURN_NOT_OK() on things that may fail in the test or the things we're testing and we CHECK_OK() on the invariants. You should RETURN_NOT_OK() when there is a chance that the caller can do something about it (e.g. as in this case in CheckRowCount vs CheckRowCountWithRetries) otherwise ASSERT_OK/CHECK_OK is better. We can't use ASSERT here, so CHECK will have to do PS7, Line 129: vector<string>() > nit: consider using '{}' instead of 'vector<string>()' for brevity. Done http://gerrit.cloudera.org:8080/#/c/5349/7/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: PS7, Line 359: static const bool WITH_NOTIFICATION_LATENCY = true; : static const bool WITHOUT_NOTIFICATION_LATENCY = false; > Are these needed in the new code? Done PS7, Line 870: kCrashesToCause > nit: since it's not static or global, consider renaming into 'crashes_to_ca Done PS7, Line 907: CHECK_OK > nit: ASSERT_OK() ? Done PS7, Line 947: 3) > Is batch of size 3 is enough to get higher chance of collision? I wonder w were only inserting 300 rows so this would be finished super fast. IMO 3 is a good tradeoff between speed and coverage. There are only 20 dup keys. Note that even with 1 there is a chance of collision. PS7, Line 954: master_flags > It seems this stays empty, consider removing this variable and using '{}' f Done PS7, Line 959: ts_flags.push_back("--raft_heartbeat_interval_ms=5"); : #else : ts_flags.push_back("--raft_heartbeat_interval_ms=1"); : #endif : ts_flags.push_back("--leader_failure_monitor_check_mean_ms=1"); : ts_flags.push_back("--leader_failure_monitor_check_stddev_ms=1"); : ts_flags.push_back("--never_fsync"); > nit: consider replacing the construction of the ts_flags via push_back() wi this was not changed in the patch. would rather not stray too much into random backlog fixes. Besides even though I do prefer initialized lists, we'de have to have an ifdef in the middle (or resort to push_back again) which I think is worse PS7, Line 968: master_flags > As described above: replace with {}. Done PS7, Line 1009: TEST_F(RaftConsensusITest, TestChurnyElections) { : const int kNumWrites = AllowSlowTests() ? 10000 : 1000; : CreateClusterForChurnyElectionsTests({}); : TestWorkload workload(cluster_.get()); : workload.set_write_batch_size(1); : DoTestChurnyElections(&workload, kNumWrites); : } : : // The same test, except inject artificial latency when propagating notifications : // from the queue back to consensus. This previously reproduced bugs like KUDU-1078 which : // normally only appear under high load. : TEST_F(RaftConsensusITest, TestChurnyElections_WithNotificationLatency) { : CreateClusterForChurnyElectionsTests({"--consensus_inject_latency_ms_in_notifications=50"}); : const int kNumWrites = AllowSlowTests() ? 10000 : 1000; : TestWorkload workload(cluster_.get()); : workload.set_write_batch_size(1); : DoTestChurnyElections(&workload, kNumWrites); : } : : // The same as TestChurnyElections except insert many duplicated rows. : // This emulates cases where there are many duplicate keys which, due to two phase : // locking, may cause deadlocks and other anomalies that cannot be observed when : // keys are unique. : TEST_F(RaftConsensusITest, TestChurnyElections_WithDuplicateKeys) { : CreateClusterForChurnyElectionsTests({}); : const int kNumWrites = AllowSlowTests() ? 10000 : 1000; : TestWorkload workload(cluster_.get()); : workload.set_write_pattern(TestWorkload::INSERT_WITH_MANY_DUP_KEYS); : // Increase the number of rows per batch to get a higher chance of key collision. : workload.set_write_batch_size(3); : DoTestChurnyElections(&workload, kNumWrites); > These look like candidates for extracting a common function and then re-usi that's what this patch does :) we need common code to create the cluster, then setup a custom workload then common code to run it. all common workload params are set on the common methods I also like that we can tweak the number of writes and workload params per test to be able to address flakyness individually -- To view, visit http://gerrit.cloudera.org:8080/5349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d018281d412ae034bd7b70c8311077a52b2795d Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes