[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp
Sailesh Mukil has posted comments on this change. Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7770/2/src/kudu/security/init.cc File src/kudu/security/init.cc: PS2, Line 295: ticket_lifetime > don't you still need a little slop here in addition to ticket_lifetime? Yes, that's true, but how low of a ticket_lifetime do we want to adjust for? Technically, the user could set a 1s ticket lifetime too, for which the above argument still holds and for which we really can't do anything. -- To view, visit http://gerrit.cloudera.org:8080/7770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp
Todd Lipcon has posted comments on this change. Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7770/2/src/kudu/security/init.cc File src/kudu/security/init.cc: PS2, Line 295: ticket_lifetime don't you still need a little slop here in addition to ticket_lifetime? eg if your lifetime is 30s, and it's 12:00:29, and renew_till is 12:01:00, you only have 1 second left. So if the renewal takes more than a second you'll be back in the danger zone. -- To view, visit http://gerrit.cloudera.org:8080/7770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp
Sailesh Mukil has posted comments on this change. Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/7770/1//COMMIT_MSG Commit Message: PS1, Line 28: refuses to read th > maybe be slightly more specific and say that the Java library refuses to re Done PS1, Line 30: ed failures. > typo Done http://gerrit.cloudera.org:8080/#/c/7770/1/src/kudu/security/init.cc File src/kudu/security/init.cc: PS1, Line 290: > am surprised to see 'difftime'. Never seen that before. Why not just cast t http://en.cppreference.com/w/c/chrono/difftime "On POSIX systems, time_t is measured in seconds, and difftime is equivalent to arithmetic subtraction, but C and C++ allow fractional units for time_t." Also, just looking at posts online generally, people discourage using '-' to subtract two time_t objects. PS1, Line 290: > according to http://web.mit.edu/kerberos/krb5-current/doc/appldev/refs/type Good catch. I added some logic to handle that case. PS1, Line 298: > we probably want a little bit of slop here just like we had with 'renew_dea Oops, you're right. I added the renew_deadline back. -- To view, visit http://gerrit.cloudera.org:8080/7770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp
Sailesh Mukil has uploaded a new patch set (#2). Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp .. Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp It was found that if we use a file based credential cache that is shared between the C++ side and the java side of a process, and we encounter the specific edge case where we renew a ticket that has less than 'ticket_lifetime' left before its 'renew_lifetime' expires, the ticket is set to have a NULL 'renew_till' timestamp. Eg: ticket_lifetime = 10m renew_lifetime = 100m [current ticket being renewed] at '15:30:00' endtime = '15:30:30' renew_till = '15:31:00' This ticket will be renewed and the renewed ticket will have the following values: endtime = '15:31:00' renew_till = null The Java krb5 library refuses to read these kinds of tickets which have the RENEWABLE flag set but no 'renew_till' set, causing unexpected failures. We work around this by reacquiring a new ticket instead of renewing the existing ticket if there is less that 'ticket_lifetime' left between now and the 'renew_till' deadline. Change-Id: I59194af94838f680df4ce121a8dee526a876e369 --- M src/kudu/integration-tests/external_mini_cluster-test.cc M src/kudu/security/init.cc 2 files changed, 12 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7770/2 -- To view, visit http://gerrit.cloudera.org:8080/7770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1894 fixed deadlock in client.Connection
Todd Lipcon has posted comments on this change. Change subject: KUDU-1894 fixed deadlock in client.Connection .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: PS2, Line 578: inflightMessages.put > uh-oh: this needs to be guarded either by lock or the collection should pro yea, I think the better bet is deferring the cleanup call back onto a new iteration of the netty event loop thread (asynchronously) after the disconnected event, if that's possible, so that the sslhandler lock is no longer held when we try to acquire our own lock? (man I can't wait for netty 4 upgrade some day) -- To view, visit http://gerrit.cloudera.org:8080/7765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Alexey Serbin has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS10, Line 1494: local_last_logged_opid = queue_->GetLastOpIdInLog(); What if tombstone_last_logged_opid was provided as well? Is it some some of programming error? Or it's OK? If the former, consider returning IllegalArgument or adding DCHECK() http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: PS10, Line 372: // Raft is in the process of stopping and will not accept writes. : kStopping, nit: is it possible to request a vote if in this state? It seems it is, but it would be nice to explicitly specify that. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2095 - Add `keepAlive` method to Java API
Todd Lipcon has posted comments on this change. Change subject: KUDU-2095 - Add `keepAlive` method to Java API .. Patch Set 3: (14 comments) http://gerrit.cloudera.org:8080/#/c/7749/3//COMMIT_MSG Commit Message: PS3, Line 13: properties that keep track of the last response, however adding this to the Java : client looked invasive and I did not implement it. not sure if this is a lack of functionality or just due to different implementation style? PS3, Line 18: MiniClusters are running at the same time Check out how TestAuthnTokenReacquire sets flags, for example, to avoid this Line 23: than it is running here but was unable to find a flag to lower it other I think you're looking for 'scanner_gc_check_interval_us' http://gerrit.cloudera.org:8080/#/c/7749/3/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: Line 843: final ServerInfo info = tablet.getReplicaSelectedServerInfo(keepAliveRequest.getReplicaSelection()); this seems relatively suspicious. I see you copied it from 'close' above, but are we guaranteed that this returns the same ServerInfo as the server which is processing our currently-open scanner? http://gerrit.cloudera.org:8080/#/c/7749/3/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java: Line 45: import com.sun.tools.doclets.formats.html.SourceToHTMLConverter; hrmm? this is causing a build failure Line 645:* Keep the current remote scanner alive. can we provide more docs on how this is meant to be used? Line 652: final Deferred d = nit: no wrap, can just combine with the 'return' below PS3, Line 672: getKeepAliveRequest nit: change to multiple lines. Add a javadoc. Line 761: final Tserver.ScannerKeepAliveRequestPB.Builder builder = Tserver.ScannerKeepAliveRequestPB.newBuilder(); wrap Line 781: String tsUUID) throws KuduException { nit: alignment Line 783: Tserver.ScannerKeepAliveResponsePB.Builder builder = Tserver.ScannerKeepAliveResponsePB.newBuilder(); nit: wrap Line 785: Tserver.ScannerKeepAliveResponsePB resp = builder.build(); nit: extra space Line 788: return new Pair(Status.OK(), error); nit: this whole function is indented one too much http://gerrit.cloudera.org:8080/#/c/7749/3/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java: PS3, Line 62:* Keep the current scanner alive on the tablet server. :* @return Status of the RPC request :* @throws KuduException needs more clear docs -- To view, visit http://gerrit.cloudera.org:8080/7749 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: t...@phdata.io Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp
Todd Lipcon has posted comments on this change. Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/7770/1//COMMIT_MSG Commit Message: PS1, Line 28: does not like this maybe be slightly more specific and say that the Java library refuses to read a ticket which has the RENEWABLE flag set, but no renew_till set. PS1, Line 30: reqacquiring typo http://gerrit.cloudera.org:8080/#/c/7770/1/src/kudu/security/init.cc File src/kudu/security/init.cc: PS1, Line 290: difftime am surprised to see 'difftime'. Never seen that before. Why not just cast to (signed) int64_ts? PS1, Line 290: creds.times.starttime according to http://web.mit.edu/kerberos/krb5-current/doc/appldev/refs/types/krb5_ticket_times.html#krb5_ticket_times the starttime may be missing, in which case we would have to fall back to 'authtime' instead. I've never seen it, but I think we should probably look whether that happens and at least make sure we do something sane in that case PS1, Line 298: (now + ticket_lifetime) > renew_till we probably want a little bit of slop here just like we had with 'renew_deadline'. Otherwise if we are exactly at the threshold where things break, we might still do a renewal and hit the issue, right? -- To view, visit http://gerrit.cloudera.org:8080/7770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Andrew Wong has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: (9 comments) http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS10, Line 1452: boost:: nit: "using boost::optional" is already specified http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: PS10, Line 351:|^ : //|| : //++ nit: prefer the slimmer version at L424 Or have this in one place and have SetStateUnlocked() refer here? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: PS10, Line 3044: back come nit: come back http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: PS10, Line 46: using std::string; : using std::vector; nit: include these? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: PS10, Line 44: using std::atomic; : using std::string; : using std::thread; : using std::unique_lock; : using std::vector; nit: includes? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/master/sys_catalog.cc File src/kudu/master/sys_catalog.cc: PS10, Line 347: // TODO(mpercy): Set to failed if Init() fails? Does this still apply? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: PS10, Line 357: Has no meaning for non-tombstoned tablets Is this to say that this will be boost::none for non-tombstoned tablets? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: PS10, Line 199: SetError nit: sort of brought this up on Slack, I think the naming for this could be a bit clearer. SetError() makes it seem like a simple setter, but it's not. Maybe SetErrorAndShutdown()? Or SetFailed() and keep the FAILED state as we discussed. The latter has the added benefit of being an external indicator of failure, rather than having each external accessor (e.g. web UI, ksck, etc.) get the error_ member. http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: PS10, Line 298: RETURN_NOT_OK(WriteConsensusMetadata()); This change isn't noted in the commit message. Mind documenting it, or at least adding a comment explaining? -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/7770 Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp .. Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp It was found that if we use a file based credential cache that is shared between the C++ side and the java side of a process, and we encounter the specific edge case where we renew a ticket that has less than 'ticket_lifetime' left before its 'renew_lifetime' expires, the ticket is set to have a NULL 'renew_till' timestamp. Eg: ticket_lifetime = 10m renew_lifetime = 100m [current ticket being renewed] at '15:30:00' endtime = '15:30:30' renew_till = '15:31:00' This ticket will be renewed and the renewed ticket will have the following values: endtime = '15:31:00' renew_till = null The Java krb5 library does not like this and doesn't handle these kinds of tickets properly, causing unexpected failures. We work around this by reqacquiring a new ticket instead of renewing the existing ticket if there is less that 'ticket_lifetime' left between now and the 'renew_till' deadline. Change-Id: I59194af94838f680df4ce121a8dee526a876e369 --- M src/kudu/integration-tests/external_mini_cluster-test.cc M src/kudu/security/init.cc 2 files changed, 6 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7770/1 -- To view, visit http://gerrit.cloudera.org:8080/7770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[kudu-CR] [build-support] IWYU build configuration for Jenkins
Hello Dan Burkert, Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7750 to look at the new patch set (#8). Change subject: [build-support] IWYU build configuration for Jenkins .. [build-support] IWYU build configuration for Jenkins Added provisions to run the include-what-you-use tool against the updated files in the changelist. Those are based on the newly introduced 'iwyu' target to be used like already existing 'lint' target. The 'iwyu' target runs IWYU against the compilation database. So, as a part of this changelist I updated the top-level CMakeLists.txt to always generate the compilation database. The compilation database is a JSON file containing information on how the targets are built. Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e --- M CMakeLists.txt A build-support/iwyu/iwyu.sh A build-support/iwyu/iwyu_tool.py M build-support/jenkins/build-and-test.sh 4 files changed, 358 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/7750/8 -- To view, visit http://gerrit.cloudera.org:8080/7750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] util: remove old failure detector and resettable heartbeater code
Todd Lipcon has posted comments on this change. Change subject: util: remove old failure detector and resettable heartbeater code .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7736 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: use periodic timers for failure detection
Todd Lipcon has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] rpc: periodic timers
Todd Lipcon has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Todd Lipcon has posted comments on this change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] rpc: periodic timers
Todd Lipcon has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic-test.cc File src/kudu/rpc/periodic-test.cc: Line 88: TEST_P(PeriodicTimerTest, TestReset) { > When I looped 1000 times with TSAN all I got were data races on access to c yea, of course it will be timing-dependent. But perhaps just running the whole thing slower (eg with period_ms_ being 500ms instead of 50) would mean that the same absolute time slownesses would be much less likely to cause a failure. http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.h File src/kudu/rpc/periodic.h: PS2, Line 59: periodic_period_jitter_pct > I think the interface would be cleaner as ok, I'm alright with a gflag on the argument that they're easier to change than constants, so long as it's experimental -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](branch-1.4.x) KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Todd Lipcon has posted comments on this change. Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. Patch Set 1: Verified+1 Unrelated flake -- To view, visit http://gerrit.cloudera.org:8080/7763 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.4.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-2032 (part 2): propagate master hostnames into client .. KUDU-2032 (part 2): propagate master hostnames into client This changes the client code to remember the user-specified master addresses and propagate them into the creation of master proxies. It's not possible to reproduce the necessary DNS configurations in a minicluster test, but with this patch I am now able to use 'kudu perf loadgen' against a Kerberized cluster even when my local krb5.conf has rdns=false. Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 Reviewed-on: http://gerrit.cloudera.org:8080/7692 Reviewed-by: Alexey Serbin Tested-by: Kudu Jenkins --- M src/kudu/client/client-internal.cc M src/kudu/client/client-internal.h M src/kudu/client/master_rpc.cc M src/kudu/client/master_rpc.h M src/kudu/integration-tests/external_mini_cluster.cc 5 files changed, 56 insertions(+), 39 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7692 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: I think the failure was caused by already-flaky java tests but I am investigating. I'd like to figure out how to run the java tests on dist-test to do a quick comparison on failure rate w/ master. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2095 - Add `keepAlive` method to Java API
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7749 to look at the new patch set (#3). Change subject: KUDU-2095 - Add `keepAlive` method to Java API .. KUDU-2095 - Add `keepAlive` method to Java API - Adds keepAlive methods to sync and async clients - Methods return a (Deferred) Status object Where possible the behavior mimics the C++ client. The C++ client has extra properties that keep track of the last response, however adding this to the Java client looked invasive and I did not implement it. I added one test in TestKuduClient that also mirrors the C++. The test runs in a KuduMiniCluster so ttl could be lowered. This means two MiniClusters are running at the same time but it seems to cause no problems and is cleaner than making a new test class. Even though the TTL is low the time it takes for the scanner to be garbage collected is long and the test takes ~10 seconds. The C++ client code comments that the garbage collection of scanners runs each 50ms which is a lot faster than it is running here but was unable to find a flag to lower it other than scanner_ttl. Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java 5 files changed, 156 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/7749/3 -- To view, visit http://gerrit.cloudera.org:8080/7749 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: t...@phdata.io Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#21). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 575 insertions(+), 444 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/21 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 21: (30 comments) http://gerrit.cloudera.org:8080/#/c/7207/19//COMMIT_MSG Commit Message: PS19, Line 22: --nu > You still need 'localhost' though; that's not an optional parameter I think Done http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager-stress-test.cc File src/kudu/fs/block_manager-stress-test.cc: Line 282: AtomicInt total_bytes_written_; > Should be Done PS20, Line 297: vector> all_dirty_blocks; : for (int i = 0; i < FLAGS_block_group_number; i++) { > You changed the pointer/ref style here. Done Line 310: > Update. Done Line 311: dirty_blocks.emplace_back(std::move(block)); > const auto& Done Line 318: // random, and write a smaller chunk of data to it. > auto& Done Line 326: // Write a small chunk of data. > const auto& Done http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: PS20, Line 44: // throughput but may improve write latency. : DEFINE_string(block_manager_flush_control, "finalize", : "Controls when to flush a block. Valid values are 'finalize', " : "'close', or 'never'. If 'finalize', blocks will be flushed " : "when writing is finished. If 'close', blocks will be > Rewrite: Done http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1410: > Okay, but why bother maintaining FlushDataAsync() as a separate method? It Makes sense. Updated. But still kept the per-impl methods in LogWritableBlock/FileWritableBlock since calling Flush() from LogBlockManager/FileBlockManager will result in more code if not. http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS20, Line 260: onst OVERRIDE; > finalizing it if it has not yet been finalized. Done PS20, Line 260: > "Also updates" Done PS20, Line 336: c > , Done PS20, Line 337: > the Done PS20, Line 434: // Malformed records and other container inconsistencies are written to : // 'report'. Healthy blocks are written either to 'live_blocks' or : // 'dead_blocks'. Live records are written to 'live_block_recor > Nit: would prefer to keep BlockCreated and BlockDeleted next to each other Done PS20, Line 543: // Offset up to which we have preallocated bytes. : int64_t preallocated_offset_ = 0; > Since all counters are now atomic, these comments are no longer interesting Done PS20, Line 878: at as a d > "data" (to be consistent with "Syncing metadata file") Done PS20, Line 882: > Replace with "is already made durable". Done PS20, Line 883: > const auto* Done PS20, Line 890: if (mode == SYNC) > I don't think this one needs to be conditioned on SYNC. Done Line 891: > Would be nice if we could DCHECK that if blocks.size() > 1, all of the bloc Done PS20, Line 1040: preallocated_offset_ = off + len; : } : > This needs to be updated. Done Line 1068: DCHECK_GE(block_offset, 0); > Don't need the if statement anymore. Done PS20, Line 1071: // means accounting for the new block should be as simple as adding its : // aligned length to 'next > Let's defer this to BlockCreated(). Then RoundUpContainer can be renamed to Done PS20, Line 1206: etrics > Nit: { this } Done Line 1216: } > Nit: { this } Done PS20, Line 1289: } : : Status LogWritableBl > Rewrite: "We do not mark the container as read-only if FlushDataAsync() fai Done Line 1317: } > 'lbm' is only used once; don't bother storing it in a local variable. Done PS20, Line 1319: state() const { > Just use block_length_ here; it's more clear. Done PS20, Line 1319: te L > and block_id_ here. Done PS20, Line 1751: ol == "close") { > Nit: Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 21 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] disk failure: don't open tablets on failed disks
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7766 to look at the new patch set (#3). Change subject: disk failure: don't open tablets on failed disks .. disk failure: don't open tablets on failed disks Currently Kudu servers open the FS layout with failed disks. However, the moment tablets attempt to bootstrap (i.e. open blocks, etc.), they will attempt to read from the failed disk and fail. This can be avoided by checking whether a tablet's disk group contains a failed disk before attempting to read data from the tablet. If so, the tablet should be marked as having an error so it can be reassigned. The default behavior of the 'fs_target_data_dirs_per_tablet' flag is updated to take into account disk state when assigning new directory groups. This allows the tablet to be reassigned to a server without being spread across a failed directory. Testing is done by loading data into a cluster with multi-disk servers, failing a single directory of one of the servers, and ensuring that the tablets spread across the failed disk get replicated upon the next startup. Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 --- M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/disk_failure-itest.cc M src/kudu/tserver/ts_tablet_manager.cc 7 files changed, 222 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/7766/3 -- To view, visit http://gerrit.cloudera.org:8080/7766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [build-support] IWYU build configuration for Jenkins
Hello Dan Burkert, Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7750 to look at the new patch set (#7). Change subject: [build-support] IWYU build configuration for Jenkins .. [build-support] IWYU build configuration for Jenkins Added provisions to run the include-what-you-use tool against the updated files in the changelist. Those are based on the newly introduced 'iwyu' target to be used like already existing 'lint' target. The 'iwyu' target runs IWYU against the compilation database. So, as a part of this changelist I updated the top-level CMakeLists.txt to always generate the compilation database. The compilation database is a JSON file containing information on how the targets are built. Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e --- M CMakeLists.txt A build-support/iwyu/iwyu.sh A build-support/iwyu/iwyu_tool.py M build-support/jenkins/build-and-test.sh 4 files changed, 360 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/7750/7 -- To view, visit http://gerrit.cloudera.org:8080/7750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [build-support] IWYU build configuration for Jenkins
Hello Dan Burkert, Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7750 to look at the new patch set (#6). Change subject: [build-support] IWYU build configuration for Jenkins .. [build-support] IWYU build configuration for Jenkins Added provisions to run the include-what-you-use tool against the updated files in the changelist. Those are based on the newly introduced 'iwyu' target to be used like already existing 'lint' target. The 'iwyu' target run IWYU against the compilation database. So, as a part of this changelist I updated the top-level CMakeLists.txt to always generate the compilation database. The compilation database is a JSON file containing information on how the targets are built. Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e --- M CMakeLists.txt A build-support/iwyu/iwyu.sh A build-support/iwyu/iwyu_tool.py M build-support/jenkins/build-and-test.sh 4 files changed, 360 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/7750/6 -- To view, visit http://gerrit.cloudera.org:8080/7750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: add persistent disk states
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7270 to look at the new patch set (#24). Change subject: disk failure: add persistent disk states .. disk failure: add persistent disk states This patch adds a new version of the path set so failed disks are not used when a server is run. A previously failed disk may continue to produce IOErrors after restart, but we may still like to start the server without the disk. To accomplish this, disk states, path names, and a timestamp are added to the path set. If any disks are failed when opening the initial FS layout, an 'unhealthy' instance with no path set metadata will be created instead of failing the startup process. CheckIntegrity() is updated to accomodate this. Rather than comparing all instances against an agreed-upon set of UUIDs, the single path set with the largest timestamp is used as the main one to compare against (if only old path sets are available, the first healthy one is used). If no instances are healthy, CheckIntegrity() will fail, as all disks are failed. Testing is done in a new iteration of CheckIntegrity(). Further testing is done in data_dirs-test, fs_manager-test, and log_block_manager-test to ensure the FS layout can be loaded with a failed directory. Some notes: - Disk failures during FS layout creation are not tolerated. In these cases, there is presumably no data on the server anyway, so operators should easily be able to fix their cluster. - In the case of a server restart where all healthy disks fail to start up and some known failed disks start working again, the server will successfully start up with the bad disks (may have partially-written data). - If there are any unhealthy instances when upgrading the path sets (i.e. adding disk states, paths, timestamp), a complete mapping of UUIDs to paths will not be available, and CheckIntegrity() will fail. - The main path set's disk states are updated to reflect the failure of the instances. Since data on a failed disk cannot be trusted, disks that are successfully read from but are already marked FAILED by the main path set are not marked HEALTHY. This patch is a part of a series of patches to handle disk failures. To see how this fits, see 2.6 in: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f --- M src/kudu/fs/block_manager_util-test.cc M src/kudu/fs/block_manager_util.cc M src/kudu/fs/block_manager_util.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc 12 files changed, 1,207 insertions(+), 278 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/24 -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 24 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] rpc: periodic timers
Dan Burkert has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.cc File src/kudu/rpc/periodic.cc: Line 38: DEFINE_double(periodic_period_jitter_pct, 0.25, > Even as an experimental flag? I couldn't see any reason to use different va See my reply to the other comment about what I would do to make this go away. The why is that the name of the flag is bad, and it unnecessarily ties together two otherwise unrelated periods (and however many more get consolidated into this abstraction). http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.h File src/kudu/rpc/periodic.h: PS2, Line 59: periodic_period_jitter_pct > As I see it, we have three options: I think the interface would be cleaner as // Creates a new PeriodicTimer. // // A ref is taken on 'messenger', which is used for scheduling callbacks. // // 'functor' defines the user's task and is owned for the lifetime of the // PeriodicTimer. // // 'jitter_pct' controls the amount of jitter to introduce ... explain how it's [period - period * (jitter_pct / 100), period + period * (jitter_pct / 100)) ... static std::shared_ptr Create( std::shared_ptr messenger, RunTaskFunctor functor, MonoDelta period, int jitter_pct = 25); There are already flags in the failure detector that are meant to be the equivalent of setting the jitter here, but they are kind of vestigial at this point. -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Adar Dembo has posted comments on this change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7734/2//COMMIT_MSG Commit Message: Line 15: heartbeat load, I think it makes sense to do it. > perhaps worth re-running the experiment from https://docs.google.com/docume Sure, I'll do that post-commit. http://gerrit.cloudera.org:8080/#/c/7734/2/src/kudu/consensus/consensus_peers.cc File src/kudu/consensus/consensus_peers.cc: Line 140: > I think std::function has an implicit constructor from any callable (seriou Thanks, fixed. -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] rpc: periodic timers
Adar Dembo has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic-test.cc File src/kudu/rpc/periodic-test.cc: PS2, Line 51: () > The parens are optional when empty (just like the return value). Don't fee I don't mind changing it. Line 77: ASSERT_GT(counter_, 0); > is this going to be flaky? maybe an AssertEventually is more robust? Sure, will add an AssertEventually. Line 88: TEST_P(PeriodicTimerTest, TestReset) { > have you looped this with tsan? I'm worried it will be flaky if this main t When I looped 1000 times with TSAN all I got were data races on access to counter_ after the test fixture had been destroyed. Explicit Messenger::Shutdown() in TearDown() ought to clear that up. In general I have no idea how to write unit tests for timers that aren't timing-dependent in some way. I'd love more feedback on this. http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.cc File src/kudu/rpc/periodic.cc: Line 38: DEFINE_double(periodic_period_jitter_pct, 0.25, > I don't think exposing this is a good idea. It's not clear what it's actua Even as an experimental flag? I couldn't see any reason to use different values for heartbeat and failure detection jitter, so I started off with a constant, then reasoned an experimental gflag is no worse than a constant (and may be better). PS2, Line 54: messenger, functor > I'm surprised clang-tidy dind't complain about these, I'd have expected pas That is odd, but I will std::move() them. PS2, Line 133: // We must schedule the next callback with the minimal period as the delay : // so that if Reset() is called with a low value, the scheduled callback : // can still honor it. : > I don't quite follow this. If we called Reset() with a small delay (less th As we discussed on Slack, if we allowed Reset() to schedule a new callback (while an existing one was in flight), we could wind up with multiple never-ending callback "loops". To prevent that, we force each timer to stick to just one callback loop, with the downside that Reset()'s delay may not be honored if it drops below GetMinimumPeriod(). I'll update the comment and/or find another way to express this. http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.h File src/kudu/rpc/periodic.h: PS2, Line 40: Messenger::ScheduleOnReactor > I think it's worth a note here that, since the tasks are run on reactor thr Yes, I meant to add that but forgot; thank you for the reminder. PS2, Line 59: periodic_period_jitter_pct > hrm, I don't know about making this a general flag vs passing it in to the As I see it, we have three options: 1. Force callers to provide it. 2. Define a constant in the code. 3. Define a gflag. I think #1 isn't interesting: certainly for the two callers I have so far there's no reason NOT to use the same percentage. So originally I did #2, then decided that #3, as long as the flag was tagged as experimental (which it is), is no worse. PS2, Line 83: period for the next : // task. > not 100% clear here. Is this setting the period (overriding what's in the ' 1. It's an 'initial delay'; it doesn't change the future period. 2. It'll be an exact delay, no jitter. I'll reword to make this more clear. Line 92: // If 'next_task_delta' is provided, it is used as the new period. Otherwise, > Similar to the above, not sure whether this means to reset the period for a It affects just the next call. Will rename to Snooze. Line 93: // the period is generated in accordance with the timer's period and jitter > in the case that it is boost::none, this still will "snooze" one full perio Yes, but it's not additive: if I call Snooze() at time X and then again at time X+0.1P, the task will run at task X+1.1P, not X+2P. Line 97: void Reset(boost::optional next_task_delta = boost::none); > May want to add a note that next_task_delta must be > GetMinimumPeriod(), o Will do. -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: use periodic timers for failure detection
Adar Dembo has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 252: std::bind([w]() { > same comment about no-arg bind Done Line 319: rng_.Normal(GetFailureMonitorCheckMeanMs(), > This might result in a negative value. It would probably be OK, but since I tried to preserve these semantics to minimize the scope of the patch. And these flags are being used by a couple itests that I didn't really have interest in updating. A negative value here just means the timer will fire immediately, which is safe. Line 533: "Failed to submit failure detected task"); > can you work the log prefix into this? Done http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: Line 27: #include > This looks like it may be a bad interaction with Alexey's patch, but I don' I rebased on top of Alexey's patch and now I need neither of these. Strange. Line 523: void EnsureFailureDetectorEnabled( > I think these methods would be better named 'EnableFailureDetector' and 'Di Sure, I don't mind. PS2, Line 529: unregistered > disabled Done -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: use periodic timers for failure detection
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7735 to look at the new patch set (#3). Change subject: consensus: use periodic timers for failure detection .. consensus: use periodic timers for failure detection This patch replaces the existing failure detection (FD) with a new approach built using periodic timers. The existing approach had a major drawback: each failure monitor required a dedicated thread, and there was a monitor for each replica. The new approach "schedules" a failure into the future using the server's reactor thread pool, "resetting" it when leader activity is detected. There's an inherent semantic mismatch between dedicated threads that periodically wake to check for failures and this new approach; I tried to provide similar semantics as best I could. Things worth noting: - Most importantly: some FD periods are now shorter. This is because the existing implementation "double counted" failure periods when adding backoff (once in LeaderElectionExpBackoffDeltaUnlocked, and once by virtue of the failure period comparison made by the failure monitor). This seemed accidental to me, so I didn't bother preserving that behavior. - It's tough to "expire" an FD using timers. Luckily, this only happens in RaftConsensus::Start, so by making PeriodicTimer::Start accept an optional delta, we can begin FD with an early delta that reflects the desired "detect a failure immediately but not too quickly" semantic, similar to how the dedicated failure monitor thread operates. - ReportFailureDetected is now run on a shared reactor thread rather than a dedicated failure monitor thread. Since StartElection performs IO, I thunked it onto the Raft thread pool. - Timer operations cannot fail, so I removed the return values from the various FD-related functions. - I also consolidated the two SnoozeFailureDetector variants; I found that this made it easier to look at all the call-sites. Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h 2 files changed, 115 insertions(+), 139 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/7735/3 -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] rpc: periodic timers
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7733 to look at the new patch set (#3). Change subject: rpc: periodic timers .. rpc: periodic timers This patch introduces a generic periodic timer class. How does it work? 1. A timer is constructed with a user-provided task functor. 2. After Start() is called, the timer will run the functor repeatedly on a fixed (and optionally, jittered) period. 3. When Reset() is called, it'll reset the period. 4. After Stop() is called, it won't run the functor anymore. The implementation is based on Messenger::ScheduleOnReactor but it could just as easily build on libev directly. I chose to use the Messenger so that I wouldn't have to implement timer thread pooling. It's also just a generic version of the Raft heartbeat logic found in Peer (see commit 1070e76). In follow-on patches I will use this class to replace the bespoke Peer logic as well as for Raft failure detection. If you're curious, it's actually because of failure detection that both Start() and Reset() optionally accept deltas to override the default period; the snoozing code likes to provide heavily customized backoff. Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 --- M src/kudu/rpc/CMakeLists.txt A src/kudu/rpc/periodic-test.cc A src/kudu/rpc/periodic.cc A src/kudu/rpc/periodic.h 4 files changed, 491 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/7733/3 -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] util: remove old failure detector and resettable heartbeater code
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7736 to look at the new patch set (#3). Change subject: util: remove old failure detector and resettable heartbeater code .. util: remove old failure detector and resettable heartbeater code With the move to periodic timers, these classes are no longer needed. Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e --- M build-support/iwyu/iwyu-filter.awk M src/kudu/util/CMakeLists.txt D src/kudu/util/failure_detector-test.cc D src/kudu/util/failure_detector.cc D src/kudu/util/failure_detector.h D src/kudu/util/resettable_heartbeater-test.cc D src/kudu/util/resettable_heartbeater.cc D src/kudu/util/resettable_heartbeater.h 8 files changed, 0 insertions(+), 891 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/7736/3 -- To view, visit http://gerrit.cloudera.org:8080/7736 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Hello Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7734 to look at the new patch set (#3). Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. consensus_peers: replace bespoke Raft heartbeat logic with periodic timers Building on the generic periodic timer implementation, this patch replaces the one-off Raft heartbeating code with periodic timers. There's only one semantic change, but it's an important one: the jittering range has changed from [P/2, P] to [3P/4, 5P/4]. When I wrote commit 1070e76 I was nervous about making such a change, but since it reduces overall heartbeat load, I think it makes sense to do it. Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h 2 files changed, 24 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/7734/3 -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1894 fixed deadlock in client.Connection
Alexey Serbin has posted comments on this change. Change subject: KUDU-1894 fixed deadlock in client.Connection .. Patch Set 2: Code-Review-1 (1 comment) http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: PS2, Line 578: inflightMessages.put uh-oh: this needs to be guarded either by lock or the collection should provide concurrency guarantees. -- To view, visit http://gerrit.cloudera.org:8080/7765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] disk failure: don't open tablets on failed disks
Andrew Wong has posted comments on this change. Change subject: disk failure: don't open tablets on failed disks .. Patch Set 2: This is currently dependent on https://gerrit.cloudera.org/#/c/7440/, which will change given some of the work Mike's been doing. I don't think those changes will affect the contents of this patch, and I would still appreciate comments here. -- To view, visit http://gerrit.cloudera.org:8080/7766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] disk failure: don't open tablets on failed disks
Andrew Wong has uploaded a new patch set (#2). Change subject: disk failure: don't open tablets on failed disks .. disk failure: don't open tablets on failed disks Currently Kudu servers open the FS layout with failed disks. However, the moment tablets attempt to bootstrap (i.e. open blocks, etc.), they will attempt to read from the failed disk and fail. This can be avoided by checking whether a tablet's disk group contains a failed disk before attempting to read data from the tablet. If so, the tablet should be marked as having an error so it can be reassigned. The default behavior of the 'fs_target_data_dirs_per_tablet' flag is updated to take into account disk state when assigning new directory groups. This allows the tablet to be reassigned to a server without being spread across a failed directory. Testing is done by loading data into a cluster with multi-disk servers, failing a single directory of one of the servers, and ensuring that the tablets spread across the failed disk get replicated upon the next startup. Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 --- M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/disk_failure-itest.cc M src/kudu/tserver/ts_tablet_manager.cc 7 files changed, 247 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/7766/2 -- To view, visit http://gerrit.cloudera.org:8080/7766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] disk failure: add persistent disk states
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7270 to look at the new patch set (#23). Change subject: disk failure: add persistent disk states .. disk failure: add persistent disk states This patch adds a new version of the path set so failed disks are not used when a server is run. A previously failed disk may continue to produce IOErrors after restart, but we may still like to start the server without the disk. To accomplish this, disk states, path names, and a timestamp are added to the path set. If any disks are failed when opening the initial FS layout, an 'unhealthy' instance with no path set metadata will be created instead of failing the startup process. CheckIntegrity() is updated to accomodate this. Rather than comparing all instances against an agreed-upon set of UUIDs, the single path set with the largest timestamp is used as the main one to compare against (if only old path sets are available, the first healthy one is used). If no instances are healthy, CheckIntegrity() will fail, as all disks are failed. Testing is done in a new iteration of CheckIntegrity(). Further testing is done in data_dirs-test, fs_manager-test, and log_block_manager-test to ensure the FS layout can be loaded with a failed directory. Some notes: - Disk failures during FS layout creation are not tolerated. In these cases, there is presumably no data on the server anyway, so operators should easily be able to fix their cluster. - In the case of a server restart where all healthy disks fail to start up and some known failed disks start working again, the server will successfully start up with the bad disks (may have partially-written data). - If there are any unhealthy instances when upgrading the path sets (i.e. adding disk states, paths, timestamp), a complete mapping of UUIDs to paths will not be available, and CheckIntegrity() will fail. - The main path set's disk states are updated to reflect the failure of the instances. Since data on a failed disk cannot be trusted, disks that are successfully read from but are already marked FAILED by the main path set are not marked HEALTHY. This patch is a part of a series of patches to handle disk failures. To see how this fits, see 2.6 in: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f --- M src/kudu/fs/block_manager_util-test.cc M src/kudu/fs/block_manager_util.cc M src/kudu/fs/block_manager_util.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc 12 files changed, 1,196 insertions(+), 278 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/23 -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock
Mike Percy has submitted this change and it was merged. Change subject: consensus: Tablet copy should clear last-logged opid from superblock .. consensus: Tablet copy should clear last-logged opid from superblock Keeping around irrelevant information is bad hygiene. Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea Reviewed-on: http://gerrit.cloudera.org:8080/7718 Reviewed-by: Todd Lipcon Tested-by: Mike Percy --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 3 files changed, 66 insertions(+), 0 deletions(-) Approvals: Mike Percy: Verified Todd Lipcon: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock
Mike Percy has posted comments on this change. Change subject: consensus: Tablet copy should clear last-logged opid from superblock .. Patch Set 5: Verified+1 The failure in this build was caused by a host error, as seen in the stderr log from dist-test. I looped maintenance_manager-test to be sure and it's not flaky at all, even under TSAN with stress. -- To view, visit http://gerrit.cloudera.org:8080/7718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: Improve contract for API to fetch last-logged op id
Mike Percy has submitted this change and it was merged. Change subject: consensus: Improve contract for API to fetch last-logged op id .. consensus: Improve contract for API to fetch last-logged op id It's important that we differentiate between when we have a known last-logged op and when we don't actually know what it is or whether we have ever appended something to the local WAL. This applies both to the TABLET_DATA_READY case, where this information is stored in the WAL, and the TABLET_DATA_TOMBSTONED case, where this information is stored in the superblock. Cases where we are unable to determine the last-logged OpId from the WAL when a replica is in TABLET_DATA_READY state: * Early in the tablet replica lifecycle (before Raft is started). * When a replica encounters an error during initialization. Cases where we are unable to determine the last-logged OpId from the TabletMetadata when a replica is in TABLET_DATA_TOMBSTONED state: * If the replica was tombstoned while in a failed state. Included in this patch are the following API improvements: 1. Delete Log::GetLatestEntryOpId(). Previously, this method would only return something other than MinimumOpId() if a log entry had been appended during the object's lifetime. It is abandoned in favor of RaftConsensus::GetLastOpId(RECEIVED_OPID) which delegates to PeerMessageQueue::GetLastOpIdInLog(). 2. Merge PeerMessageQueue::Init() into the PeerMessageQueue constructor. This allows us to remove one lifecycle state and allows us to guarantee that, once the queue is constructed, we can always get a valid last-logged opid from it (see #1). 3. Make TabletMetadata::tombstone_last_logged_opid() return a boost::optional. We need to clearly differentiate between when we know the last-logged opid and when we don't. We also consider MinimumOpId() to be equal to boost::none at superblock load time, since previous versions of Kudu may have written (0,0) into the TabletMetadata 'tombstone_last_logged_opid' field. Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Reviewed-on: http://gerrit.cloudera.org:8080/7717 Reviewed-by: Todd Lipcon Tested-by: Kudu Jenkins --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc 19 files changed, 244 insertions(+), 242 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1894 fixed deadlock in client.Connection
Alexey Serbin has uploaded a new patch set (#2). Change subject: KUDU-1894 fixed deadlock in client.Connection .. KUDU-1894 fixed deadlock in client.Connection Due to the reverse order of acquiring the Connection.lock and lower-level Netty locks, Connection.enqueueMesage() could deadlock if a ChannelDisconnected/ChannelClosed event arrived in the middle. Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java 1 file changed, 11 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/7765/2 -- To view, visit http://gerrit.cloudera.org:8080/7765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1894 fixed deadlock in client.Connection
Alexey Serbin has posted comments on this change. Change subject: KUDU-1894 fixed deadlock in client.Connection .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/7765/1//COMMIT_MSG Commit Message: Line 13: This changelist addresses KUDU-1894. > Curious why you specified this here and didn't just prefix the commit summa That's because KUDU-1894 was about some different issue in the beginning. >From the other side, it's better to stick to current interpretation: KUDU-1894 >contains the exact information about the deadlock this patch fixes. http://gerrit.cloudera.org:8080/#/c/7765/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: PS1, Line 485: Netty's > Netty (here and below) Done PS1, Line 491: if > of a Done Line 573: @GuardedBy("lock") > The annotation should be removed, right? Done -- To view, visit http://gerrit.cloudera.org:8080/7765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [java] fixed deadlock in client.Connection
Adar Dembo has posted comments on this change. Change subject: [java] fixed deadlock in client.Connection .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7765/1//COMMIT_MSG Commit Message: Line 13: This changelist addresses KUDU-1894. Curious why you specified this here and didn't just prefix the commit summary with the bug reference? -- To view, visit http://gerrit.cloudera.org:8080/7765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [java] fixed deadlock in client.Connection
Dan Burkert has posted comments on this change. Change subject: [java] fixed deadlock in client.Connection .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7765/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: PS1, Line 485: Netty's Netty (here and below) PS1, Line 491: if of a Line 573: @GuardedBy("lock") The annotation should be removed, right? -- To view, visit http://gerrit.cloudera.org:8080/7765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1407: reassign failed tablets
Andrew Wong has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 15: Just rebased this in my series of start-time failure-handling patches, since it's necessary for https://gerrit.cloudera.org/#/c/7766/ to reassign tablets that fail to bootstrap. Currently still going through the tombstoned voting patch to see how this needs to be updated. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] avoid opening tablets on failed disks
Andrew Wong has uploaded a new change for review. http://gerrit.cloudera.org:8080/7766 Change subject: avoid opening tablets on failed disks .. avoid opening tablets on failed disks Currently Kudu servers can start up with failed disks. However, the moment tablets attempt to bootstrap (i.e. open blocks, etc.), they will attempt to read from the failed disk and fail. This can be avoided by checking whether a tablet's disk group contains a failed disk before attempting to open the tablet. If so, the tablet should be marked as having an error so it can be reassigned. The default behavior of the 'fs_target_data_dirs_per_tablet' flag is updated to take into account disk state when assigning new directory groups. This allows the tablet to be reassigned to a server without being spread across a failed directory. Testing is done by loading data into a cluster with multi-disk servers, failing a single directory of one of the servers, and ensuring that the tablets spread across the failed disk get replicated upon the next startup. Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 --- M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/log_block_manager.cc M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/disk_failure-itest.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc 9 files changed, 255 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/7766/1 -- To view, visit http://gerrit.cloudera.org:8080/7766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong
[kudu-CR] [java] fixed deadlock in client.Connection
Alexey Serbin has uploaded a new change for review. http://gerrit.cloudera.org:8080/7765 Change subject: [java] fixed deadlock in client.Connection .. [java] fixed deadlock in client.Connection Due to the reverse order of acquiring the Connection.lock and lower-level Netty locks, Connection.enqueueMesage() could deadlock if a ChannelDisconnected/ChannelClosed event arrived in the middle. This changelist addresses KUDU-1894. Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java 1 file changed, 11 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/7765/1 -- To view, visit http://gerrit.cloudera.org:8080/7765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin
[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client
Alexey Serbin has posted comments on this change. Change subject: KUDU-2032 (part 2): propagate master hostnames into client .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7692 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [java] Include Spark/Scala base version in kudu-spark-tools
Dan Burkert has submitted this change and it was merged. Change subject: [java] Include Spark/Scala base version in kudu-spark-tools .. [java] Include Spark/Scala base version in kudu-spark-tools Renames the kudu-spark-tools artifact to include the spark and scala base version. Including the scala version is standard practice and helps users avoid binary compatability issues. Adding the Spark base version matches the pattern used in the kudu-spark module and prevents users from trying to use kudu-spark-tools with Spark 1. This change will also make upgrades to Scala 2.12 or Spark 3 in the future more seamless. Before: kudu-spark-tools After:kudu-spark2-tools_2.11 Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d Reviewed-on: http://gerrit.cloudera.org:8080/7723 Tested-by: Kudu Jenkins Reviewed-by: Jean-Daniel Cryans --- M docs/release_notes.adoc M java/kudu-spark-tools/build.gradle M java/kudu-spark-tools/pom.xml 3 files changed, 10 insertions(+), 2 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [java] Include Spark/Scala base version in kudu-spark-tools
Jean-Daniel Cryans has posted comments on this change. Change subject: [java] Include Spark/Scala base version in kudu-spark-tools .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-1.4.x) KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/7763 Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. KUDU-1942. Kerberos fails to log in on hostnames with capital letters This ensures that servers canonicalize their FQDNs to lower-case before generating Kerberos principal names. With this change I was able to set up a working cluster on my laptop with a capitalized hostname, where before it would fail as described in the JIRA. I also verified that I was able to connect from both C++ and Java clients. Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Reviewed-on: http://gerrit.cloudera.org:8080/7693 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin (cherry picked from commit ecfcb312161a4bc6f8da0ebb837169a2acf08eef) --- M src/kudu/security/init.cc 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/7763/1 -- To view, visit http://gerrit.cloudera.org:8080/7763 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.4.x Gerrit-Owner: Todd Lipcon
[kudu-CR](branch-1.3.x) KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/7764 Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. KUDU-1942. Kerberos fails to log in on hostnames with capital letters This ensures that servers canonicalize their FQDNs to lower-case before generating Kerberos principal names. With this change I was able to set up a working cluster on my laptop with a capitalized hostname, where before it would fail as described in the JIRA. I also verified that I was able to connect from both C++ and Java clients. Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Reviewed-on: http://gerrit.cloudera.org:8080/7693 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin (cherry picked from commit ecfcb312161a4bc6f8da0ebb837169a2acf08eef) --- M src/kudu/security/init.cc 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/7764/1 -- To view, visit http://gerrit.cloudera.org:8080/7764 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Todd Lipcon
[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client
Hello Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7692 to look at the new patch set (#8). Change subject: KUDU-2032 (part 2): propagate master hostnames into client .. KUDU-2032 (part 2): propagate master hostnames into client This changes the client code to remember the user-specified master addresses and propagate them into the creation of master proxies. It's not possible to reproduce the necessary DNS configurations in a minicluster test, but with this patch I am now able to use 'kudu perf loadgen' against a Kerberized cluster even when my local krb5.conf has rdns=false. Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 --- M src/kudu/client/client-internal.cc M src/kudu/client/client-internal.h M src/kudu/client/master_rpc.cc M src/kudu/client/master_rpc.h M src/kudu/integration-tests/external_mini_cluster.cc 5 files changed, 56 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/7692/8 -- To view, visit http://gerrit.cloudera.org:8080/7692 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock
Todd Lipcon has posted comments on this change. Change subject: consensus: Tablet copy should clear last-logged opid from superblock .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: Improve contract for API to fetch last-logged op id
Todd Lipcon has posted comments on this change. Change subject: consensus: Improve contract for API to fetch last-logged op id .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: Improve contract for API to fetch last-logged op id
Mike Percy has posted comments on this change. Change subject: consensus: Improve contract for API to fetch last-logged op id .. Patch Set 4: I had to manually rebase on top of Alexey's IWYU patch. The conflicts were very minimal. -- To view, visit http://gerrit.cloudera.org:8080/7717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: Improve contract for API to fetch last-logged op id
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7717 to look at the new patch set (#4). Change subject: consensus: Improve contract for API to fetch last-logged op id .. consensus: Improve contract for API to fetch last-logged op id It's important that we differentiate between when we have a known last-logged op and when we don't actually know what it is or whether we have ever appended something to the local WAL. This applies both to the TABLET_DATA_READY case, where this information is stored in the WAL, and the TABLET_DATA_TOMBSTONED case, where this information is stored in the superblock. Cases where we are unable to determine the last-logged OpId from the WAL when a replica is in TABLET_DATA_READY state: * Early in the tablet replica lifecycle (before Raft is started). * When a replica encounters an error during initialization. Cases where we are unable to determine the last-logged OpId from the TabletMetadata when a replica is in TABLET_DATA_TOMBSTONED state: * If the replica was tombstoned while in a failed state. Included in this patch are the following API improvements: 1. Delete Log::GetLatestEntryOpId(). Previously, this method would only return something other than MinimumOpId() if a log entry had been appended during the object's lifetime. It is abandoned in favor of RaftConsensus::GetLastOpId(RECEIVED_OPID) which delegates to PeerMessageQueue::GetLastOpIdInLog(). 2. Merge PeerMessageQueue::Init() into the PeerMessageQueue constructor. This allows us to remove one lifecycle state and allows us to guarantee that, once the queue is constructed, we can always get a valid last-logged opid from it (see #1). 3. Make TabletMetadata::tombstone_last_logged_opid() return a boost::optional. We need to clearly differentiate between when we know the last-logged opid and when we don't. We also consider MinimumOpId() to be equal to boost::none at superblock load time, since previous versions of Kudu may have written (0,0) into the TabletMetadata 'tombstone_last_logged_opid' field. Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc 19 files changed, 244 insertions(+), 242 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/7717/4 -- To view, visit http://gerrit.cloudera.org:8080/7717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#10). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. * Get rid of tablet::FAILED state, instead go to SHUTDOWN with an error. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/raft_consensus-itest.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/ksck-test.cc M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 33 files changed, 1,439 insertions(+), 340 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/10 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock
Todd Lipcon has posted comments on this change. Change subject: consensus: Tablet copy should clear last-logged opid from superblock .. Patch Set 4: Code-Review+2 Got it, sorry for the confusion. -- To view, visit http://gerrit.cloudera.org:8080/7718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock
Mike Percy has posted comments on this change. Change subject: consensus: Tablet copy should clear last-logged opid from superblock .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7718/3//COMMIT_MSG Commit Message: Line 9: Keeping around irrelevant information is bad hygiene. > Mike and I discussed offline... it seems like this might not be a great ide The patch as-written only clears the last-logged opid when we transition to TABLET_DATA_READY, which is only synced after tablet copy is successfully completed. So the scenario summarized above wouldn't be affected negatively by this (correctness related) patch. I definitely agree we want to preserve the last-logged opid as much as possible (for availability reasons) and I am planning on ensuring that we do that in a follow-up patch. -- To view, visit http://gerrit.cloudera.org:8080/7718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: Improve contract for API to fetch last-logged op id
Todd Lipcon has posted comments on this change. Change subject: consensus: Improve contract for API to fetch last-logged op id .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
Todd Lipcon has submitted this change and it was merged. Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a .. Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a This commit had some semantic conflict with the recent IWYU include cleanup. Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 Reviewed-on: http://gerrit.cloudera.org:8080/7762 Reviewed-by: Alexey Serbin Tested-by: Todd Lipcon --- M src/kudu/rpc/connection.h M src/kudu/rpc/connection_id.cc 2 files changed, 2 insertions(+), 0 deletions(-) Approvals: Todd Lipcon: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1
David Ribeiro Alves has uploaded a new patch set (#10). Change subject: Kudu Consistency Blog Post Pt1 .. Kudu Consistency Blog Post Pt1 This is the first part of multi-part blog post series about consistency in Kudu. It's hard to talk about consistency topics without context, particularly in blog posts where we can't assume very much about the readers. After struggling a bit with coming up with something that could be self contained in a single post I ended up biting the bullet and putting my name down to write a multi-part series. The series will cover Kudu from a consistency standpoint. The first post (this one) is about motivation and how the write and read paths work in general. Follow up posts will cover specific components, edge cases and guarantees. I'll finish with a post about overall semantics and what users can expect in the future. Rendering available at: http://24.7.81.123:4000/2017/08/21/kudu-consistency-pt1.html Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 --- A _posts/2017-08-21-kudu-consistency-pt1.md 1 file changed, 172 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7019/10 -- To view, visit http://gerrit.cloudera.org:8080/7019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] Add summary-only mode to ksck
Jean-Daniel Cryans has posted comments on this change. Change subject: [tools] Add summary-only mode to ksck .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7759/2/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: Line 57: "If set, ksck prints only summary information. Use with the --format flag to " Does this description still make sense when also doing a checksum check? -- To view, visit http://gerrit.cloudera.org:8080/7759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a961867a9e6a32ca91144843f9d1b8139f756f5 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
Todd Lipcon has posted comments on this change. Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a .. Patch Set 2: Verified+1 Pushing without waiting for tests since this unbreaks build (ensured that build succeeded in debug jenkins build) -- To view, visit http://gerrit.cloudera.org:8080/7762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [tools] Add summary-only mode to ksck
Will Berkeley has uploaded a new patch set (#2). Change subject: [tools] Add summary-only mode to ksck .. [tools] Add summary-only mode to ksck This patch add a -summary_only flag to ksck, which suppresses all output except for the table summary at the end. When used in combination with the -format flag supported by the DataTable class, it allows users to get useful machine-readable ksck output as csv, tsv, or json. Change-Id: I3a961867a9e6a32ca91144843f9d1b8139f756f5 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h 3 files changed, 134 insertions(+), 125 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/7759/2 -- To view, visit http://gerrit.cloudera.org:8080/7759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3a961867a9e6a32ca91144843f9d1b8139f756f5 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] Add summary-only mode to ksck
Will Berkeley has posted comments on this change. Change subject: [tools] Add summary-only mode to ksck .. Patch Set 1: Verified+1 Build failure due to KUDU-1736 (certainly unrelated to this patch). -- To view, visit http://gerrit.cloudera.org:8080/7759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a961867a9e6a32ca91144843f9d1b8139f756f5 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1811: C++ client: use larger batches when fetching scan tokens
Dan Burkert has posted comments on this change. Change subject: KUDU-1811: C++ client: use larger batches when fetching scan tokens .. Patch Set 5: (2 comments) Overall I think this is a good change, but I wonder if it wouldn't just be simpler to change the existing constant to 100 and call it good enough. 10 does seem like a low number in any situation given the overhead of RPCs. http://gerrit.cloudera.org:8080/#/c/7748/5/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: Line 812: int requested_batch_size) { Might be good to name this 'max_returned_locations' to keep it consistent. http://gerrit.cloudera.org:8080/#/c/7748/5/src/kudu/client/meta_cache.h File src/kudu/client/meta_cache.h: Line 390: void LookupTabletsByKeyOrNext(const KuduTable* table, Instead of creating a new method, it may be cleaner to add an optional argument to the existing 'LookupTabletByKeyOrNext' method, with the default value of kFetchTabletsPerPointLookup. Then you could expose kFetchTabletsPerRangeLookup as a public constant, and use it from the scan token generator. -- To view, visit http://gerrit.cloudera.org:8080/7748 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79340dc9963944454770d82a2fbaba1b0c8a810e Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jun He Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jun He Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
Alexey Serbin has posted comments on this change. Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
Hello Alexey Serbin, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7762 to look at the new patch set (#2). Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a .. Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a This commit had some semantic conflict with the recent IWYU include cleanup. Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 --- M src/kudu/rpc/connection.h M src/kudu/rpc/connection_id.cc 2 files changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/7762/2 -- To view, visit http://gerrit.cloudera.org:8080/7762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 20: (30 comments) http://gerrit.cloudera.org:8080/#/c/7207/19//COMMIT_MSG Commit Message: PS19, Line 22: s-per > Done You still need 'localhost' though; that's not an optional parameter I think. I was just saying you could remove the port, which is optional. http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager-stress-test.cc File src/kudu/fs/block_manager-stress-test.cc: Line 282: Slice seed_slice(reinterpret_cast(&seed), sizeof(seed)); Should be PS20, Line 297: WritableBlock *block = dirty_blocks[next_block_idx].get(); : Random &rand = dirty_block_rands[next_block_idx]; You changed the pointer/ref style here. Line 310: // Close all dirty blocks. Update. Line 311: for (const auto &dirty_block : dirty_blocks) { const auto& Line 318: for (auto &dirty_block : dirty_blocks) { auto& Line 326: for (const auto &block : all_dirty_blocks) { const auto& http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: PS20, Line 44: "Control when to flush a block, either at 'finalize', 'close'," : " or 'never'. 'finalize' indicates to flush when writing to " : "the block is finished. 'close' indicates to defer the flushing " : "to when the entire BlockTransaction, that the blocks belong " : "to, is committed. And 'never' is not flush at all."); Rewrite: Controls when to flush a block. Valid values are 'finalize', 'close', or 'never'. If 'finalize', blocks will be flushed when writing is finished. If 'close', blocks will be flushed when their transaction is committed. If 'never', blocks will never be flushed. http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1410: } > Updated. To clarify, I did this because we were saying 'Finalize indicates Okay, but why bother maintaining FlushDataAsync() as a separate method? It certainly shouldn't be part of the WritableBlock interface because there's no need for anyone outside the block manager to know about the distinction. I suppose you could keep it as per-impl methods in LogWritableBlock/FileWritableBlock, but it's just one line of code, so the decomposition doesn't seem that useful. http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS20, Line 260: finalize it if has not finalizing it if it has not yet been finalized. PS20, Line 260: And update "Also updates" PS20, Line 336: . , PS20, Line 337: th the PS20, Line 434: // Finalize a fully written block. It round up this container, truncate it if full : // and mark the container as available. : void FinalizeBlock(int64_t block_offset, int64_t block_length); Nit: would prefer to keep BlockCreated and BlockDeleted next to each other since they're fairly symmetric. PS20, Line 543: // : // Declared atomic because it is mutated from BlockDeleted(). Since all counters are now atomic, these comments are no longer interesting. PS20, Line 878: container "data" (to be consistent with "Syncing metadata file") PS20, Line 882: does so at any given point Replace with "is already made durable". Or rewrite as "Append metadata only after data is synced so that there's no chance of metadata landing on the disk before the data." PS20, Line 883: const auto & const auto* PS20, Line 890: if (mode == SYNC) I don't think this one needs to be conditioned on SYNC. Line 891: for (LogWritableBlock* block : blocks) { Would be nice if we could DCHECK that if blocks.size() > 1, all of the blocks are FINALIZED. If they aren't, we have a big problem: we'll end up calling FinalizeBlock() on the same container multiple times from within DoClose() and could corrupt on-disk data via truncate calls. PS20, Line 1040: // Note that this take places _after_ the container has been synced to disk. : // That's OK; truncation isn't needed for correctness, and in the event of a : // crash or error, it will be retried at startup. This needs to be updated. Line 1068: if (PREDICT_TRUE(new_next_block_offset >= next_block_offset())) { Don't need the if statement anymore. PS20, Line 1071: total_bytes_ .IncrementBy(fs_aligned_length(block_offset, block_length)); : total_blocks_.Increment(); Let's defer this to BlockCreated(). Then RoundUpContainer can be renamed to the more clear UpdateNextBlockOffset. PS20, Line 1206: {this} Nit: { this } Line 1216: RETURN_NOT_OK(container_->DoCloseBlocks({this}, LogBlockContainer
[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1
David Ribeiro Alves has uploaded a new patch set (#9). Change subject: Kudu Consistency Blog Post Pt1 .. Kudu Consistency Blog Post Pt1 This is the first part of multi-part blog post series about consistency in Kudu. It's hard to talk about consistency topics without context, particularly in blog posts where we can't assume very much about the readers. After struggling a bit with coming up with something that could be self contained in a single post I ended up biting the bullet and putting my name down to write a multi-part series. The series will cover Kudu from a consistency standpoint. The first post (this one) is about motivation and how the write and read paths work in general. Follow up posts will cover specific components, edge cases and guarantees. I'll finish with a post about overall semantics and what users can expect in the future. Rendering available at: http://dralves-server:4000/2017/08/21/kudu-consistency-pt1.html Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 --- A _posts/2017-08-21-kudu-consistency-pt1.md 1 file changed, 172 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7019/9 -- To view, visit http://gerrit.cloudera.org:8080/7019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Will Berkeley
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Todd Lipcon has posted comments on this change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7734/2/src/kudu/consensus/consensus_peers.cc File src/kudu/consensus/consensus_peers.cc: Line 140: std::bind([w]() { > oh sure, if the task abstraction has a constructor for that (I haven't chec I think std::function has an implicit constructor from any callable (serious template magic) -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
Alexey Serbin has posted comments on this change. Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7762/1/src/kudu/rpc/connection_id.cc File src/kudu/rpc/connection_id.cc: PS1, Line 22: #include > I thought you decided we had to add ostream everywhere that we used logging Yep, but that's only when there is a construct like LOG(INFO) << "mega info"; in the file. The in that case is needed for the ostream& operator<<(ostream&). Otherwise, IWYU will say the should be removed. -- To view, visit http://gerrit.cloudera.org:8080/7762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
Todd Lipcon has posted comments on this change. Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7762/1/src/kudu/rpc/connection_id.cc File src/kudu/rpc/connection_id.cc: PS1, Line 22: #include > I'm not sure we need this for this version. What is this for? I thought you decided we had to add ostream everywhere that we used logging.h? At least it's included in 410 of our source files :) -- To view, visit http://gerrit.cloudera.org:8080/7762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1811: C++ client: use larger batches when fetching scan tokens
Todd Lipcon has posted comments on this change. Change subject: KUDU-1811: C++ client: use larger batches when fetching scan tokens .. Patch Set 5: (2 comments) just a couple nits. I also added Dan who knows this code a bit better and may have more substantial comments. http://gerrit.cloudera.org:8080/#/c/7748/5/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: Line 76: /** style nit: we use // even for block comments Line 1048: string partition_key, nit: alignment -- To view, visit http://gerrit.cloudera.org:8080/7748 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79340dc9963944454770d82a2fbaba1b0c8a810e Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jun He Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jun He Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Dan Burkert has posted comments on this change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7734/2/src/kudu/consensus/consensus_peers.cc File src/kudu/consensus/consensus_peers.cc: Line 140: std::bind([w]() { > oh, good catch. shouldn't you be able to just pass the lambda directly anyw oh sure, if the task abstraction has a constructor for that (I haven't checked). -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
Alexey Serbin has posted comments on this change. Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7762/1/src/kudu/rpc/connection_id.cc File src/kudu/rpc/connection_id.cc: PS1, Line 22: #include I'm not sure we need this for this version. What is this for? -- To view, visit http://gerrit.cloudera.org:8080/7762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] consensus: use periodic timers for failure detection
Dan Burkert has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 252: std::bind([w]() { same comment about no-arg bind Line 319: rng_.Normal(GetFailureMonitorCheckMeanMs(), This might result in a negative value. It would probably be OK, but since it would very rarely occur in tests, it's perhaps best to bound it. Bigger picture, it seems like overkill to keep around the leader_failure_monitor_check_mean_ms and leader_failure_monitor_check_stddev_ms flags just for this very specific case. Perhaps remove them, and use the new failure detection timeperiod calculation (uniform distribution) instead? http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: Line 27: #include This looks like it may be a bad interaction with Alexey's patch, but I don't think you should need both optional.hpp and none.hpp Line 523: void EnsureFailureDetectorEnabled( I think these methods would be better named 'EnableFailureDetector' and 'DisableFailureDetector' to match 'SnoozeFailureDetector'. Feel free to ignore if you feel that's outside the scope of this patch. -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client
Hello Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7692 to look at the new patch set (#7). Change subject: KUDU-2032 (part 2): propagate master hostnames into client .. KUDU-2032 (part 2): propagate master hostnames into client This changes the client code to remember the user-specified master addresses and propagate them into the creation of master proxies. It's not possible to reproduce the necessary DNS configurations in a minicluster test, but with this patch I am now able to use 'kudu perf loadgen' against a Kerberized cluster even when my local krb5.conf has rdns=false. Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 --- M src/kudu/client/client-internal.cc M src/kudu/client/client-internal.h M src/kudu/client/master_rpc.cc M src/kudu/client/master_rpc.h M src/kudu/integration-tests/external_mini_cluster.cc 5 files changed, 57 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/7692/7 -- To view, visit http://gerrit.cloudera.org:8080/7692 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] util: remove old failure detector and resettable heartbeater code
Todd Lipcon has posted comments on this change. Change subject: util: remove old failure detector and resettable heartbeater code .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7736 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
Hello Alexey Serbin, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7762 to review the following change. Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a .. Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a This commit had some semantic conflict with the recent IWYU include cleanup. Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 --- M src/kudu/rpc/connection.h M src/kudu/rpc/connection_id.cc 2 files changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/7762/1 -- To view, visit http://gerrit.cloudera.org:8080/7762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Todd Lipcon has posted comments on this change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7734/2/src/kudu/consensus/consensus_peers.cc File src/kudu/consensus/consensus_peers.cc: Line 140: std::bind([w]() { > I think it would be more clear to create a std::function or whatever type d oh, good catch. shouldn't you be able to just pass the lambda directly anyway, with no binding? -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1
David Ribeiro Alves has uploaded a new patch set (#8). Change subject: Kudu Consistency Blog Post Pt1 .. Kudu Consistency Blog Post Pt1 This is the first part of multi-part blog post series about consistency in Kudu. It's hard to talk about consistency topics without context, particularly in blog posts where we can't assume very much about the readers. After struggling a bit with coming up with something that could be self contained in a single post I ended up biting the bullet and putting my name down to write a multi-part series. The series will cover Kudu from a consistency standpoint. The first post (this one) is about motivation and how the write and read paths work in general. Follow up posts will cover specific components, edge cases and guarantees. I'll finish with a post about overall semantics and what users can expect in the future. Rendering available at: http://dralves-server:4000/2017/08/21/kudu-consistency-pt1.html Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 --- A _posts/2017-08-21-kudu-consistency-pt1.md 1 file changed, 158 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7019/8 -- To view, visit http://gerrit.cloudera.org:8080/7019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client
Alexey Serbin has posted comments on this change. Change subject: KUDU-2032 (part 2): propagate master hostnames into client .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7692 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1
David Ribeiro Alves has uploaded a new patch set (#7). Change subject: Kudu Consistency Blog Post Pt1 .. Kudu Consistency Blog Post Pt1 This is the first part of multi-part blog post series about consistency in Kudu. It's hard to talk about consistency topics without context, particularly in blog posts where we can't assume very much about the readers. After struggling a bit with coming up with something that could be self contained in a single post I ended up biting the bullet and putting my name down to write a multi-part series. The series will cover Kudu from a consistency standpoint. The first post (this one) is about motivation and how the write and read paths work in general. Follow up posts will cover specific components, edge cases and guarantees. I'll finish with a post about overall semantics and what users can expect in the future. Rendering available at: http://dralves-server:4000/2017/08/21/kudu-consistency-pt1.html Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 --- A _posts/2017-08-21-kudu-consistency-pt1.md 1 file changed, 157 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7019/7 -- To view, visit http://gerrit.cloudera.org:8080/7019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies .. KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies This modifies the constructor of RPC proxies (generated and otherwise) to take the remote hostname in addition to the existing resolved Sockaddr parameter. The hostname is then passed into the ConnectionId object, and plumbed through to the SASL client in place of the IP address that was used previously. The patch changes all of the construction sites of Proxy to fit the new interface. In most of the test cases, we don't have real hostnames, so we just use the dotted-decimal string form of the remote Sockaddr, which matches the existing behavior. In the real call sites, we have actual host names typically specified by the user, and in those cases we'll need to pass those into the proxy. In a few cases, they were conveniently available in the same function that creates the proxy. In others, they are relatively far away, so this patch just uses the dotted-decimal string and leaves TODOs. In the case that Kerberos is not configured, this change should have no effect since the hostname is ignored by SASL "plain". In the case that Kerberos is configured with 'rdns=true', they also have no effect, because the krb5 library will resolve and reverse the hostname from the IP as it did before. In the case that 'rdns=false', this moves us one step closer to fixing KUDU-2032 by getting a hostname into the SASL library. I verified that, if I set 'rdns = false' on a Kerberized client, I'm now able to run 'kudu master status ' successfully where it would not before. This tool uses a direct proxy instantiation where the hostname was easy to plumb in. 'kudu table list ' still does not work because it uses the client, which wasn't convenient to plumb quite yet. Given that this makes incremental improvement towards fixing the issue without any regression, and is already a fairly wide patch, I hope to commit this and then address the remaining plumbing in a separate patch. Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98 Reviewed-on: http://gerrit.cloudera.org:8080/7687 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/client/client-internal.cc M src/kudu/client/master_rpc.cc M src/kudu/client/meta_cache.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/exactly_once_writes-itest.cc M src/kudu/integration-tests/external_mini_cluster.cc M src/kudu/integration-tests/internal_mini_cluster.cc M src/kudu/integration-tests/master-stress-test.cc M src/kudu/integration-tests/master_cert_authority-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/security-itest.cc M src/kudu/integration-tests/table_locations-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/master-test.cc M src/kudu/master/master.cc M src/kudu/master/sys_catalog-test.cc M src/kudu/master/ts_descriptor.cc M src/kudu/master/ts_descriptor.h M src/kudu/rpc/connection.h M src/kudu/rpc/connection_id.cc M src/kudu/rpc/connection_id.h M src/kudu/rpc/exactly_once_rpc-test.cc M src/kudu/rpc/mt-rpc-test.cc M src/kudu/rpc/negotiation.cc M src/kudu/rpc/protoc-gen-krpc.cc M src/kudu/rpc/proxy.cc M src/kudu/rpc/proxy.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/rpc-bench.cc M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/rpc_stub-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_server_test_util.cc M src/kudu/util/net/net_util-test.cc M src/kudu/util/net/sockaddr.cc M src/kudu/util/net/sockaddr.h 42 files changed, 220 insertions(+), 139 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7687 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [build-support] IWYU build configuration for Jenkins
Adar Dembo has posted comments on this change. Change subject: [build-support] IWYU build configuration for Jenkins .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: use periodic timers for failure detection
Todd Lipcon has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 533: "Failed to submit failure detected task"); can you work the log prefix into this? http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: PS2, Line 529: unregistered disabled -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1
Alexey Serbin has posted comments on this change. Change subject: Kudu Consistency Blog Post Pt1 .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/7019/6//COMMIT_MSG Commit Message: PS6, Line 25: dralves-server nit: mind adding the domain name? It would be nice to get the IP or FQDN here. -- To view, visit http://gerrit.cloudera.org:8080/7019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] disk failure: add persistent disk states
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7270 to look at the new patch set (#22). Change subject: disk failure: add persistent disk states .. disk failure: add persistent disk states This patch adds a new version of the path set so failed disks are not used when a server is run. A previously failed disk may continue to produce IOErrors after restart, but we may still like to start the server without the disk. To accomplish this, disk states, path names, and a timestamp are added to the path set. If any disks are failed when opening the initial FS layout, an 'unhealthy' instance with no path set metadata will be created instead of failing the startup process. CheckIntegrity() is updated to accomodate this. Rather than comparing all instances against an agreed-upon set of UUIDs, the single path set with the largest timestamp is used as the main one to compare against (if only old path sets are available, the first healthy one is used). If no instances are healthy, CheckIntegrity() will fail, as all disks are failed. Testing is done in a new iteration of CheckIntegrity(). Further testing is done in data_dirs-test, fs_manager-test, and log_block_manager-test to ensure the FS layout can be loaded with a failed directory. Some notes: - Disk failures during FS layout creation are not tolerated. In these cases, there is presumably no data on the server anyway, so operators should easily be able to fix their cluster. - In the case of a server restart where all healthy disks fail to start up and some known failed disks start working again, the server will successfully start up with the bad disks (may have partially-written data). - If there are any unhealthy instances when upgrading the path sets (i.e. adding disk states, paths, timestamp), a complete mapping of UUIDs to paths will not be available, and CheckIntegrity() will fail. - The main path set's disk states are updated to reflect the failure of the instances. Since data on a failed disk cannot be trusted, disks that are successfully read from but are already marked FAILED by the main path set are not marked HEALTHY. This patch is a part of a series of patches to handle disk failures. To see how this fits, see 2.6 in: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f --- M src/kudu/fs/block_manager_util-test.cc M src/kudu/fs/block_manager_util.cc M src/kudu/fs/block_manager_util.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc 12 files changed, 1,187 insertions(+), 278 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/22 -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Dan Burkert has posted comments on this change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7734/2/src/kudu/consensus/consensus_peers.cc File src/kudu/consensus/consensus_peers.cc: Line 140: std::bind([w]() { I think it would be more clear to create a std::function or whatever type directly instead of an indirectly through a no arg bind. -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies
Alexey Serbin has posted comments on this change. Change subject: KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7687 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Todd Lipcon has posted comments on this change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/7734/2//COMMIT_MSG Commit Message: Line 15: heartbeat load, I think it makes sense to do it. perhaps worth re-running the experiment from https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit# to make sure the new jitter parameters don't regress the stabilization after a failure. I think it's fine to do that after commit, though, since you did the small-scale experiment that you commented on in this gerrit. -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [build-support] IWYU build configuration for Jenkins
Alexey Serbin has posted comments on this change. Change subject: [build-support] IWYU build configuration for Jenkins .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7750/4/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: Line 324: mkdir -p $TEST_LOGDIR # In case no tests are run (the directory won't be created) > Don't need this anymore? Good catch! -- To view, visit http://gerrit.cloudera.org:8080/7750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [build-support] IWYU build configuration for Jenkins
Hello Dan Burkert, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7750 to look at the new patch set (#5). Change subject: [build-support] IWYU build configuration for Jenkins .. [build-support] IWYU build configuration for Jenkins Added provisions to run the include-what-you-use tool against the updated files in the changelist. Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e --- A build-support/iwyu/iwyu_tool.py M build-support/jenkins/build-and-test.sh 2 files changed, 326 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/7750/5 -- To view, visit http://gerrit.cloudera.org:8080/7750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] rpc: periodic timers
Todd Lipcon has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic-test.cc File src/kudu/rpc/periodic-test.cc: Line 77: ASSERT_GT(counter_, 0); is this going to be flaky? maybe an AssertEventually is more robust? Line 88: TEST_P(PeriodicTimerTest, TestReset) { have you looped this with tsan? I'm worried it will be flaky if this main thread gets descheduled for more than 20ms. maybe need a much larger period to avoid flakiness? http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.cc File src/kudu/rpc/periodic.cc: PS2, Line 54: messenger, functor I'm surprised clang-tidy dind't complain about these, I'd have expected passing them through with std::move PS2, Line 133: // We must schedule the next callback with the minimal period as the delay : // so that if Reset() is called with a low value, the scheduled callback : // can still honor it. : I don't quite follow this. If we called Reset() with a small delay (less than minimum period) wouldn't we just schedule a new callback precisely for the requested time? http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.h File src/kudu/rpc/periodic.h: PS2, Line 40: Messenger::ScheduleOnReactor I think it's worth a note here that, since the tasks are run on reactor threads, they are not allowed to block or otherwise do lengthy computation, etc. PS2, Line 59: periodic_period_jitter_pct hrm, I don't know about making this a general flag vs passing it in to the PeriodicTimer instance. I think we've been trying to avoid gflags being read too much from utility code that may affect multiple subsystems. That said, if it's an experimental/unstable flag it's no big deal since we can always adjust later. PS2, Line 83: period for the next : // task. not 100% clear here. Is this setting the period (overriding what's in the 'period' set in the ctor) or is this more of an 'initial delay' after which it will revert to rescheduling based on the configured period/jitter? Also, will jitter be applied to the provided value or will this be an exact delay? Maybe something like "delay for scheduling the next task" or "delay after the task will first be invoked, after which it will revert to the scheduling paramters provided in the constructor" or something Line 92: // If 'next_task_delta' is provided, it is used as the new period. Otherwise, Similar to the above, not sure whether this means to reset the period for all subsequent schedulings, or just the subsequent call. If the latter, maybe 'Snooze' is a better name than 'Reset' since reset sounds more like a permanent change? Line 93: // the period is generated in accordance with the timer's period and jitter in the case that it is boost::none, this still will "snooze" one full period worth, right? -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] rpc: periodic timers
Dan Burkert has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.cc File src/kudu/rpc/periodic.cc: Line 38: DEFINE_double(periodic_period_jitter_pct, 0.25, I don't think exposing this is a good idea. It's not clear what it's actually controlling. If we wanted to make this configurable, I'd expect it to be done on a per-use basis (e.g. heartbeat_period_jitter). -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1
David Ribeiro Alves has uploaded a new patch set (#6). Change subject: Kudu Consistency Blog Post Pt1 .. Kudu Consistency Blog Post Pt1 This is the first part of multi-part blog post series about consistency in Kudu. It's hard to talk about consistency topics without context, particularly in blog posts where we can't assume very much about the readers. After struggling a bit with coming up with something that could be self contained in a single post I ended up biting the bullet and putting my name down to write a multi-part series. The series will cover Kudu from a consistency standpoint. The first post (this one) is about motivation and how the write and read paths work in general. Follow up posts will cover specific components, edge cases and guarantees. I'll finish with a post about overall semantics and what users can expect in the future. Rendering available at: http://dralves-server:4000/2017/08/21/kudu-consistency-pt1.html Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 --- A _posts/2017-08-21-kudu-consistency-pt1.md 1 file changed, 152 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7019/6 -- To view, visit http://gerrit.cloudera.org:8080/7019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Will Berkeley