[kudu-CR] KUDU-1500: Fix the data race during RaftConsensusITest.TestCorruptReplicaMetadata
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1500: Fix the data race during RaftConsensusITest.TestCorruptReplicaMetadata .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/2691/ -- To view, visit http://gerrit.cloudera.org:8080/3823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-1500: Fix the data race during RaftConsensusITest.TestCorruptReplicaMetadata
Dinesh Bhat has uploaded a new patch set (#2). Change subject: KUDU-1500: Fix the data race during RaftConsensusITest.TestCorruptReplicaMetadata .. KUDU-1500: Fix the data race during RaftConsensusITest.TestCorruptReplicaMetadata The test intends to corrupt the metadata of one of the tserver tablets. While the cluster is in the process of resurrecting the corrupt metadata, the partition schema is accessed in an unguarded manner from another thread servicing ListTablets API. The proposed fix here accesses the metadata partition resources via copy by value mechanism now in slow paths, where copy itself happens under the same lock which makes the metadata resurrection operation idempotent. Fast paths are untouched since the parition schema is bound to be visible only after tablet metadata is resurrected. Testing: Passed about 2000 iteration of the failing test raft_consensus-itest.TestCorruptReplicaMetadata Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019 --- M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_bootstrap.h M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_peer.cc M src/kudu/tools/fs_tool.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tserver-path-handlers.cc 8 files changed, 25 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/2 -- To view, visit http://gerrit.cloudera.org:8080/3823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-1500: This intends to fix the data race observed by TSAN during one of the raft consensus tests.
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1500: This intends to fix the data race observed by TSAN during one of the raft consensus tests. .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/2690/ -- To view, visit http://gerrit.cloudera.org:8080/3823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-1500: This intends to fix the data race observed by TSAN during one of the raft consensus tests.
Dinesh Bhat has uploaded a new change for review. http://gerrit.cloudera.org:8080/3823 Change subject: KUDU-1500: This intends to fix the data race observed by TSAN during one of the raft consensus tests. .. KUDU-1500: This intends to fix the data race observed by TSAN during one of the raft consensus tests. The test intends to corrupt the metadata of one of the tserver tablets. While the cluster is in the process of resurrecting the corrupt metadata, the partition schema is accessed in an unguarded manner from another thread servicing ListTablets API. The proposed fix here accesses the metadata partition resources via copy by value mechanism now in slow paths, where copy itself happens under the same lock which makes the metadata resurrection operation idempotent. Fast paths are untouched since the parition schema is bound to be visible only after tablet metadata is resurrected. Testing: Passed about 2000 iteration of the failing test raft_consensus-itest.TestCorruptReplicaMetadata Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019 --- M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_bootstrap.h M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_peer.cc M src/kudu/tools/fs_tool.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tserver-path-handlers.cc 8 files changed, 25 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/1 -- To view, visit http://gerrit.cloudera.org:8080/3823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat
[kudu-CR] KUDU-1358 (part 3): new multi-master stress test
Adar Dembo has posted comments on this change. Change subject: KUDU-1358 (part 3): new multi-master stress test .. Patch Set 13: > have you looped this one? Also, Dan, any further comments? Yes, I did quite a few 1000 run loops, in debug mode, ASAN. TSAN, and release. -- To view, visit http://gerrit.cloudera.org:8080/3611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40b5b78c100a7b427b2f4aac3a54665e82a9618c Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1358 (part 3): new multi-master stress test
Todd Lipcon has posted comments on this change. Change subject: KUDU-1358 (part 3): new multi-master stress test .. Patch Set 13: Code-Review+1 have you looped this one? Also, Dan, any further comments? -- To view, visit http://gerrit.cloudera.org:8080/3611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40b5b78c100a7b427b2f4aac3a54665e82a9618c Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1358 (part 2): heartbeat to every master
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1358 (part 2): heartbeat to every master .. KUDU-1358 (part 2): heartbeat to every master Now that followers accept heartbeats, let's modify the tserver to send one to every master. Spawning a heartbeater thread for each master seemed like the natural way to do this; it should simplify dynamic master changes in the future (i.e. just add or remove threads as needed). The "dirty tablet" state is now encapsulated in the heartbeater threads themselves, and the heartbeater must "fan out" to manipulate all of it. It's a little noisy but I think it's reasonable. The alternative is for this state to remain in the TSTabletManager, for the heartbeater to continue tracking which master is the leader, and for it to only send tablet reports to that master. This can be done with a few changes (e.g. adding term numbers to the heartbeat response), but the only benefit is reduced network traffic when tablets are dirty, so that didn't seem worth the complexity. There's no new test here, but this code path is exercised in the test I reenabled, and in the new stress test (follow-on patch). Change-Id: Ic85ac4193462d21c989dbd7874b451e8eaab8e3e Reviewed-on: http://gerrit.cloudera.org:8080/3610 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon --- M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/heartbeater.h M src/kudu/tserver/ts_tablet_manager-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 7 files changed, 311 insertions(+), 264 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3610 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic85ac4193462d21c989dbd7874b451e8eaab8e3e Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1358 (part 2): heartbeat to every master
Todd Lipcon has posted comments on this change. Change subject: KUDU-1358 (part 2): heartbeat to every master .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3610 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic85ac4193462d21c989dbd7874b451e8eaab8e3e Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1358 (part 1): master should accept heartbeat even if follower
Todd Lipcon has posted comments on this change. Change subject: KUDU-1358 (part 1): master should accept heartbeat even if follower .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3609 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I578674927b65b4171e8437de8515130e4a0ed139 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1358 (part 1): master should accept heartbeat even if follower
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1358 (part 1): master should accept heartbeat even if follower .. KUDU-1358 (part 1): master should accept heartbeat even if follower This patch changes the master's heartbeat acceptance code so that heartbeats are not rejected outright if the master is a follower. To be specific, tablet reports are ignored, but heartbeats are processed just enough to warm the TSDescriptor cache. That way, if this master is elected leader, it can respond to a CreateTable() even before the first round of heartbeats. I reduced the complexity of the "should this tserver register or send a full tablet report?" dance by removing TSDescriptor.has_tablet_report_. It was used to guarantee a full tablet report in the event that 1) the tserver is sending incremental tablet reports, and 2) the master has already registered the tserver. I don't think this exact sequence of events is actually possible; the only way a master can "lose" a cached TSDescriptor is if the master is restarted, at which point it loses the tserver registration too. Plus, all the unit tests passed (in slow mode). I also snuck in a fix to TSManager::RegisterTS: it wasn't actually returning a TSDescriptor in its out parameter. Change-Id: I578674927b65b4171e8437de8515130e4a0ed139 Reviewed-on: http://gerrit.cloudera.org:8080/3609 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon --- M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/mini_cluster.cc M src/kudu/integration-tests/mini_cluster.h M src/kudu/integration-tests/registration-test.cc M src/kudu/integration-tests/table_locations-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master-test.cc M src/kudu/master/master_service.cc M src/kudu/master/ts_descriptor.cc M src/kudu/master/ts_descriptor.h M src/kudu/master/ts_manager.cc 13 files changed, 226 insertions(+), 125 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3609 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I578674927b65b4171e8437de8515130e4a0ed139 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1416 Upsert support for Flume sink
Todd Lipcon has posted comments on this change. Change subject: KUDU-1416 Upsert support for Flume sink .. Patch Set 2: Mike, can you take another look at this before it gets too stale? (Maybe already is stale due to the great package rename) -- To view, visit http://gerrit.cloudera.org:8080/3157 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5b5df70687103ed6916d58148336882aa66d85 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1548. Fix flaky RaftConsensusITest.TestReplaceChangeConfigOperation
Todd Lipcon has posted comments on this change. Change subject: KUDU-1548. Fix flaky RaftConsensusITest.TestReplaceChangeConfigOperation .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1548. Fix flaky RaftConsensusITest.TestReplaceChangeConfigOperation
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1548. Fix flaky RaftConsensusITest.TestReplaceChangeConfigOperation .. KUDU-1548. Fix flaky RaftConsensusITest.TestReplaceChangeConfigOperation This test was occasionally failing its 2nd election in TSAN mode because we were resuming the previous leader before the new leader could be elected. Sometimes the previous leader was fast enough to replicate its pending config change to a majority of nodes before the new candidate could send out its election RPC, thus violating the underlying assumptions of this test. I also added a minor C++11 syntax-only change in the cluster itest utils class as a part of this patch (doesn't change any behavior). Before this fix, this test failed 15/800 times on the dist-test cluster. After this change, it passed 100% of the time. The test log looked something like this: I0729 18:59:47.834403 11544 raft_consensus.cc:370] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 FOLLOWER]: No leader contacted us within the election timeout. Triggering leader election I0729 18:59:47.834686 11544 raft_consensus.cc:2019] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 FOLLOWER]: Advancing to term 2 I0729 18:59:47.840427 11544 leader_election.cc:223] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [CANDIDATE]: Term 2 election: Requesting vote from peer 54197053abab4b6cb1b1632c9d1062dc I0729 18:59:47.840860 11544 leader_election.cc:223] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [CANDIDATE]: Term 2 election: Requesting vote from peer 3522a8de8170476dba0beb58cb2150d4 I0729 18:59:47.872720 11669 raft_consensus.cc:869] T e3503c47a21649ca931234999cd0bb45 P 3522a8de8170476dba0beb58cb2150d4 [term 1 FOLLOWER]: Refusing update from remote peer 54197053abab4b6cb1b1632c9d1062dc: Log matching property violated. Preceding OpId in replica: term: 1 index: 1. Preceding OpId from leader: term: 1 index: 2. (index mismatch) I0729 18:59:47.874522 11454 consensus_queue.cc:578] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [LEADER]: Connected to new peer: Peer: 3522a8de8170476dba0beb58cb2150d4, Is new: false, Last received: 1.1, Next index: 2, Last known committed idx: 1, Last exchange result: ERROR, Needs remote bootstrap: false I0729 18:59:47.878105 11150 raft_consensus.cc:1324] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 LEADER]: Handling vote request from an unknown peer d4f64819170a4cf78fe4c9e9a72ec4b9 I0729 18:59:47.878290 11150 raft_consensus.cc:2014] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 LEADER]: Stepping down as leader of term 1 I0729 18:59:47.878451 11150 raft_consensus.cc:499] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 LEADER]: Becoming Follower/Learner. State: Replica: 54197053abab4b6cb1b1632c9d1062dc, State: 1, Role: LEADER Watermarks: {Received: term: 1 index: 2 Committed: term: 1 index: 1} I0729 18:59:47.878968 11150 consensus_queue.cc:162] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, Majority replicated op: 1.1, Committed index: 1.1, Last appended: 1.2, Current term: 1, Majority size: -1, State: 1, Mode: NON_LEADER I0729 18:59:47.879871 11150 consensus_peers.cc:358] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc -> Peer 3522a8de8170476dba0beb58cb2150d4 (127.37.56.2:53243): Closing peer: 3522a8de8170476dba0beb58cb2150d4 I0729 18:59:47.882057 11150 raft_consensus.cc:2019] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 FOLLOWER]: Advancing to term 2 I0729 18:59:47.885711 11150 raft_consensus.cc:1626] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 FOLLOWER]: Leader election vote request: Denying vote to candidate d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged OpId of term: 1 index: 2, which is greater than that of the candidate, which has last-logged OpId of term: 1 index: 1. I0729 18:59:47.892060 11477 leader_election.cc:361] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [CANDIDATE]: Term 2 election: Vote denied by peer 54197053abab4b6cb1b1632c9d1062dc. Message: Invalid argument: T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 FOLLOWER]: Leader election vote request: Denying vote to candidate d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged OpId of term: 1 index: 2, which is greater than that of the candidate, which has last-logged OpId of term: 1 index: 1. I0729 18:59:47.894548 11669 raft_consensus.cc:1324] T e3503c47a21649ca931234999cd0bb45 P 3522a8de8170476dba0beb58cb2
[kudu-CR] docs: Add missing DISTRIBUTE to quickstart
Todd Lipcon has posted comments on this change. Change subject: docs: Add missing DISTRIBUTE to quickstart .. Patch Set 1: Did you verify this syntax by trying it? (I can't remember if the DISTRIBUTE BY goes before or after TBLPROPERTIES) -- To view, visit http://gerrit.cloudera.org:8080/3820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6a010bf90901bd98bf2c8e33396edb0d152c1d67 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] docs: fix links to MR examples in developing
Todd Lipcon has posted comments on this change. Change subject: docs: fix links to MR examples in developing .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/3821/1/docs/developing.adoc File docs/developing.adoc: Line 142: link:https://github.com/apache/incubator-kudu/blob/master/java/kudu-client-tools/src/main/java/org/org/apache/mapreduce/tools/ImportCsv.java[ImportCsv.java] these links both still seem wrong - should be src/main/java/org/apache/kudu/mapreduce not org/org/apache/mapreduce -- To view, visit http://gerrit.cloudera.org:8080/3821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b98d0ac25d75fc18ff694ca9c47ad1b9720570b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes