[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp

2017-08-21 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 
'renew_till' timestamp
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7770/2/src/kudu/security/init.cc
File src/kudu/security/init.cc:

PS2, Line 295: ticket_lifetime
> don't you still need a little slop here in addition to ticket_lifetime?
Yes, that's true, but how low of a ticket_lifetime do we want to adjust for? 
Technically, the user could set a 1s ticket lifetime too, for which the above 
argument still holds and for which we really can't do anything.


-- 
To view, visit http://gerrit.cloudera.org:8080/7770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 
'renew_till' timestamp
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7770/2/src/kudu/security/init.cc
File src/kudu/security/init.cc:

PS2, Line 295: ticket_lifetime
don't you still need a little slop here in addition to ticket_lifetime?

eg if your lifetime is 30s, and it's 12:00:29, and renew_till is 12:01:00, you 
only have 1 second left. So if the renewal takes more than a second you'll be 
back in the danger zone.


-- 
To view, visit http://gerrit.cloudera.org:8080/7770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp

2017-08-21 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 
'renew_till' timestamp
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7770/1//COMMIT_MSG
Commit Message:

PS1, Line 28: refuses to read th
> maybe be slightly more specific and say that the Java library refuses to re
Done


PS1, Line 30: ed failures.
> typo
Done


http://gerrit.cloudera.org:8080/#/c/7770/1/src/kudu/security/init.cc
File src/kudu/security/init.cc:

PS1, Line 290: 
> am surprised to see 'difftime'. Never seen that before. Why not just cast t
http://en.cppreference.com/w/c/chrono/difftime

"On POSIX systems, time_t is measured in seconds, and difftime is equivalent to 
arithmetic subtraction, but C and C++ allow fractional units for time_t."

Also, just looking at posts online generally, people discourage using '-' to 
subtract two time_t objects.


PS1, Line 290: 
> according to http://web.mit.edu/kerberos/krb5-current/doc/appldev/refs/type
Good catch. I added some logic to handle that case.


PS1, Line 298: 
> we probably want a little bit of slop here just like we had with 'renew_dea
Oops, you're right. I added the renew_deadline back.


-- 
To view, visit http://gerrit.cloudera.org:8080/7770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp

2017-08-21 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#2).

Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 
'renew_till' timestamp
..

Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp

It was found that if we use a file based credential cache that is
shared between the C++ side and the java side of a process, and we
encounter the specific edge case where we renew a ticket that has
less than 'ticket_lifetime' left before its 'renew_lifetime' expires,
the ticket is set to have a NULL 'renew_till' timestamp.

Eg:
ticket_lifetime = 10m
renew_lifetime = 100m

[current ticket being renewed] at '15:30:00'
endtime = '15:30:30'
renew_till = '15:31:00'

This ticket will be renewed and the renewed ticket will have the
following values:
endtime = '15:31:00'
renew_till = null

The Java krb5 library refuses to read these kinds of tickets which
have the RENEWABLE flag set but no 'renew_till' set, causing
unexpected failures. We work around this by reacquiring a new ticket
instead of renewing the existing ticket if there is less that
'ticket_lifetime' left between now and the 'renew_till' deadline.

Change-Id: I59194af94838f680df4ce121a8dee526a876e369
---
M src/kudu/integration-tests/external_mini_cluster-test.cc
M src/kudu/security/init.cc
2 files changed, 12 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7770/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS2, Line 578: inflightMessages.put
> uh-oh: this needs to be guarded either by lock or the collection should pro
yea, I think the better bet is deferring the cleanup call back onto a new 
iteration of the netty event loop thread (asynchronously) after the 
disconnected event, if that's possible, so that the sslhandler lock is no 
longer held when we try to acquire our own lock?

(man I can't wait for netty 4 upgrade some day)


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-871. Support tombstoned voting

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-871. Support tombstoned voting
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS10, Line 1494: local_last_logged_opid = queue_->GetLastOpIdInLog();
What if tombstone_last_logged_opid was provided as well?  Is it some some of 
programming error?  Or it's OK?  If the former, consider returning 
IllegalArgument or adding DCHECK()


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS10, Line 372: // Raft is in the process of stopping and will not accept 
writes.
  : kStopping,
nit: is it possible to request a vote if in this state?  It seems it is, but it 
would be nice to explicitly specify that.


-- 
To view, visit http://gerrit.cloudera.org:8080/6960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2095 - Add `keepAlive` method to Java API

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-2095 - Add `keepAlive` method to Java API
..


Patch Set 3:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/7749/3//COMMIT_MSG
Commit Message:

PS3, Line 13: properties that keep track of the last response, however adding 
this to the Java
: client looked invasive and I did not implement it.
not sure if this is a lack of functionality or just due to different 
implementation style?


PS3, Line 18: MiniClusters are running at the same time
Check out how TestAuthnTokenReacquire sets flags, for example, to avoid this


Line 23: than it is running here but was unable to find a flag to lower it other
I think you're looking for 'scanner_gc_check_interval_us'


http://gerrit.cloudera.org:8080/#/c/7749/3/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

Line 843: final ServerInfo info = 
tablet.getReplicaSelectedServerInfo(keepAliveRequest.getReplicaSelection());
this seems relatively suspicious. I see you copied it from 'close' above, but 
are we guaranteed that this returns the same ServerInfo as the server which is 
processing our currently-open scanner?


http://gerrit.cloudera.org:8080/#/c/7749/3/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

Line 45: import com.sun.tools.doclets.formats.html.SourceToHTMLConverter;
hrmm? this is causing a build failure


Line 645:* Keep the current remote scanner alive.
can we provide more docs on how this is meant to be used?


Line 652: final Deferred d =
nit: no wrap, can just combine with the 'return' below


PS3, Line 672: getKeepAliveRequest
nit: change to multiple lines. Add a javadoc.


Line 761:   final Tserver.ScannerKeepAliveRequestPB.Builder builder = 
Tserver.ScannerKeepAliveRequestPB.newBuilder();
wrap


Line 781: String tsUUID) throws 
KuduException {
nit: alignment


Line 783: Tserver.ScannerKeepAliveResponsePB.Builder builder = 
Tserver.ScannerKeepAliveResponsePB.newBuilder();
nit: wrap


Line 785: Tserver.ScannerKeepAliveResponsePB resp =  builder.build();
nit: extra space


Line 788: return new Pair(Status.OK(), error);
nit: this whole function is indented one too much


http://gerrit.cloudera.org:8080/#/c/7749/3/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java:

PS3, Line 62:* Keep the current scanner alive on the tablet server.
:* @return Status of the RPC request
:* @throws KuduException
needs more clear docs


-- 
To view, visit http://gerrit.cloudera.org:8080/7749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: t...@phdata.io
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 
'renew_till' timestamp
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7770/1//COMMIT_MSG
Commit Message:

PS1, Line 28: does not like this
maybe be slightly more specific and say that the Java library refuses to read a 
ticket which has the RENEWABLE flag set, but no renew_till set.


PS1, Line 30: reqacquiring
typo


http://gerrit.cloudera.org:8080/#/c/7770/1/src/kudu/security/init.cc
File src/kudu/security/init.cc:

PS1, Line 290: difftime
am surprised to see 'difftime'. Never seen that before. Why not just cast to 
(signed) int64_ts?


PS1, Line 290: creds.times.starttime
according to 
http://web.mit.edu/kerberos/krb5-current/doc/appldev/refs/types/krb5_ticket_times.html#krb5_ticket_times
 the starttime may be missing, in which case we would have to fall back to 
'authtime' instead. I've never seen it, but I think we should probably look 
whether that happens and at least make sure we do something sane in that case


PS1, Line 298: (now + ticket_lifetime) > renew_till
we probably want a little bit of slop here just like we had with 
'renew_deadline'. Otherwise if we are exactly at the threshold where things 
break, we might still do a renewal and hit the issue, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/7770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-871. Support tombstoned voting

2017-08-21 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-871. Support tombstoned voting
..


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS10, Line 1452: boost::
nit: "using boost::optional" is already specified


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS10, Line 351:|^
  :   //||
  :   //++
nit: prefer the slimmer version at L424
Or have this in one place and have SetStateUnlocked() refer here?


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS10, Line 3044: back come 
nit: come back


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-itest.cc
File src/kudu/integration-tests/tombstoned_voting-itest.cc:

PS10, Line 46: using std::string;
 : using std::vector;
nit: include these?


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-stress-test.cc
File src/kudu/integration-tests/tombstoned_voting-stress-test.cc:

PS10, Line 44: using std::atomic;
 : using std::string;
 : using std::thread;
 : using std::unique_lock;
 : using std::vector;
nit: includes?


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

PS10, Line 347: // TODO(mpercy): Set to failed if Init() fails?
Does this still apply?


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_metadata.h
File src/kudu/tablet/tablet_metadata.h:

PS10, Line 357: Has no meaning for non-tombstoned tablets
Is this to say that this will be boost::none for non-tombstoned tablets?


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

PS10, Line 199: SetError
nit: sort of brought this up on Slack, I think the naming for this could be a 
bit clearer. SetError() makes it seem like a simple setter, but it's not. Maybe 
SetErrorAndShutdown()? Or SetFailed() and keep the FAILED state as we discussed.

The latter has the added benefit of being an external indicator of failure, 
rather than having each external accessor (e.g. web UI, ksck, etc.) get the 
error_ member.


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS10, Line 298: RETURN_NOT_OK(WriteConsensusMetadata());
This change isn't noted in the commit message. Mind documenting it, or at least 
adding a comment explaining?


-- 
To view, visit http://gerrit.cloudera.org:8080/6960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Adjust kerberos renewal logic to avoid tickets with NULL 'renew till' timestamp

2017-08-21 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7770

Change subject: Adjust kerberos renewal logic to avoid tickets with NULL 
'renew_till' timestamp
..

Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp

It was found that if we use a file based credential cache that is
shared between the C++ side and the java side of a process, and we
encounter the specific edge case where we renew a ticket that has
less than 'ticket_lifetime' left before its 'renew_lifetime' expires,
the ticket is set to have a NULL 'renew_till' timestamp.

Eg:
ticket_lifetime = 10m
renew_lifetime = 100m

[current ticket being renewed] at '15:30:00'
endtime = '15:30:30'
renew_till = '15:31:00'

This ticket will be renewed and the renewed ticket will have the
following values:
endtime = '15:31:00'
renew_till = null

The Java krb5 library does not like this and doesn't handle these
kinds of tickets properly, causing unexpected failures. We work around
this by reqacquiring a new ticket instead of renewing the existing
ticket if there is less that 'ticket_lifetime' left between now and
the 'renew_till' deadline.

Change-Id: I59194af94838f680df4ce121a8dee526a876e369
---
M src/kudu/integration-tests/external_mini_cluster-test.cc
M src/kudu/security/init.cc
2 files changed, 6 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7770/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[kudu-CR] [build-support] IWYU build configuration for Jenkins

2017-08-21 Thread Alexey Serbin (Code Review)
Hello Dan Burkert, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7750

to look at the new patch set (#8).

Change subject: [build-support] IWYU build configuration for Jenkins
..

[build-support] IWYU build configuration for Jenkins

Added provisions to run the include-what-you-use tool against the
updated files in the changelist.  Those are based on the newly
introduced 'iwyu' target to be used like already existing 'lint'
target.

The 'iwyu' target runs IWYU against the compilation database.  So,
as a part of this changelist I updated the top-level CMakeLists.txt
to always generate the compilation database.  The compilation database
is a JSON file containing information on how the targets are built.

Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e
---
M CMakeLists.txt
A build-support/iwyu/iwyu.sh
A build-support/iwyu/iwyu_tool.py
M build-support/jenkins/build-and-test.sh
4 files changed, 358 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/7750/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7750
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] util: remove old failure detector and resettable heartbeater code

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: util: remove old failure detector and resettable heartbeater 
code
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] consensus: use periodic timers for failure detection

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: consensus: use periodic timers for failure detection
..


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7735
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] rpc: periodic timers

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: rpc: periodic timers
..


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: consensus_peers: replace bespoke Raft heartbeat logic with 
periodic timers
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7734
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] rpc: periodic timers

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: rpc: periodic timers
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic-test.cc
File src/kudu/rpc/periodic-test.cc:

Line 88: TEST_P(PeriodicTimerTest, TestReset) {
> When I looped 1000 times with TSAN all I got were data races on access to c
yea, of course it will be timing-dependent. But perhaps just running the whole 
thing slower (eg with period_ms_ being 500ms instead of 50) would mean that the 
same absolute time slownesses would be much less likely to cause a failure.


http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.h
File src/kudu/rpc/periodic.h:

PS2, Line 59: periodic_period_jitter_pct
> I think the interface would be cleaner as
ok, I'm alright with a gflag on the argument that they're easier to change than 
constants, so long as it's experimental


-- 
To view, visit http://gerrit.cloudera.org:8080/7733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR](branch-1.4.x) KUDU-1942. Kerberos fails to log in on hostnames with capital letters

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital 
letters
..


Patch Set 1: Verified+1

Unrelated flake

-- 
To view, visit http://gerrit.cloudera.org:8080/7763
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-2032 (part 2): propagate master hostnames into client
..


KUDU-2032 (part 2): propagate master hostnames into client

This changes the client code to remember the user-specified master
addresses and propagate them into the creation of master proxies.

It's not possible to reproduce the necessary DNS configurations in a
minicluster test, but with this patch I am now able to use 'kudu perf
loadgen' against a Kerberized cluster even when my local krb5.conf has
rdns=false.

Change-Id: Ie5838f22c96f757124112b505825a53740468ce1
Reviewed-on: http://gerrit.cloudera.org:8080/7692
Reviewed-by: Alexey Serbin 
Tested-by: Kudu Jenkins
---
M src/kudu/client/client-internal.cc
M src/kudu/client/client-internal.h
M src/kudu/client/master_rpc.cc
M src/kudu/client/master_rpc.h
M src/kudu/integration-tests/external_mini_cluster.cc
5 files changed, 56 insertions(+), 39 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7692
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-871. Support tombstoned voting

2017-08-21 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-871. Support tombstoned voting
..


Patch Set 10:

I think the failure was caused by already-flaky java tests but I am 
investigating. I'd like to figure out how to run the java tests on dist-test to 
do a quick comparison on failure rate w/ master.

-- 
To view, visit http://gerrit.cloudera.org:8080/6960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-2095 - Add `keepAlive` method to Java API

2017-08-21 Thread Anonymous Coward (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7749

to look at the new patch set (#3).

Change subject: KUDU-2095 - Add `keepAlive` method to Java API
..

KUDU-2095 - Add `keepAlive` method to Java API

- Adds keepAlive methods to sync and async clients
- Methods return a (Deferred) Status object

Where possible the behavior mimics the C++ client. The C++ client has extra
properties that keep track of the last response, however adding this to the Java
client looked invasive and I did not implement it.

I added one test in TestKuduClient that also mirrors the C++. The test
runs in a KuduMiniCluster so ttl could be lowered. This means two
MiniClusters are running at the same time but it seems to cause no
problems and is cleaner than making a new test class. Even though the
TTL is low the time it takes for the scanner to be garbage collected is
long and the test takes ~10 seconds. The C++ client code comments that
the garbage collection of scanners runs each 50ms which is a lot faster
than it is running here but was unable to find a flag to lower it other
than scanner_ttl.

Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
5 files changed, 156 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/7749/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: t...@phdata.io
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager

2017-08-21 Thread Hao Hao (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7207

to look at the new patch set (#21).

Change subject: KUDU-1943: Add BlockTransaction to Block Manager
..

KUDU-1943: Add BlockTransaction to Block Manager

This adds a new layer of abstraction 'BlockTransaction' to
Block Manager, to accumulate new blocks in a logical operation,
e.g. flush/compaction. Avoid using fsync() per block can
potentionaly reduce a lot I/O overhead.

This patch also add a new API, Finalize(), to Block Manager,
so that for LBM container can be reused without flushing
in-flight writable blocks to disk. This can bring down the
log block container number and disk space consumption for
first few flush/compaction operations when using log block
manager.

I performed manual test which creates a table with two tablets,
'kudu perf loadgen localhost --num-rows-per-thread=2000
 --num-threads=10 --keep-auto-table --table_num_buckets=2'.
With this patch, 'log_block_manager_containers' dropped from 56
to 7.

A design doc can be found here:
https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing

Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd
---
M src/kudu/cfile/bloomfile.cc
M src/kudu/cfile/bloomfile.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/multi_column_writer.cc
M src/kudu/tablet/multi_column_writer.h
20 files changed, 575 insertions(+), 444 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/21
-- 
To view, visit http://gerrit.cloudera.org:8080/7207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd
Gerrit-PatchSet: 21
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager

2017-08-21 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change.

Change subject: KUDU-1943: Add BlockTransaction to Block Manager
..


Patch Set 21:

(30 comments)

http://gerrit.cloudera.org:8080/#/c/7207/19//COMMIT_MSG
Commit Message:

PS19, Line 22:  --nu
> You still need 'localhost' though; that's not an optional parameter I think
Done


http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager-stress-test.cc
File src/kudu/fs/block_manager-stress-test.cc:

Line 282:   AtomicInt total_bytes_written_;
> Should be 
Done


PS20, Line 297: vector> all_dirty_blocks;
  : for (int i = 0; i < FLAGS_block_group_number; i++) {
> You changed the pointer/ref style here.
Done


Line 310: 
> Update.
Done


Line 311: dirty_blocks.emplace_back(std::move(block));
> const auto&
Done


Line 318:   // random, and write a smaller chunk of data to it.
> auto&
Done


Line 326: // Write a small chunk of data.
> const auto&
Done


http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager.cc
File src/kudu/fs/block_manager.cc:

PS20, Line 44: //   throughput but may improve write latency.
 : DEFINE_string(block_manager_flush_control, "finalize",
 :   "Controls when to flush a block. Valid values are 
'finalize', "
 :   "'close', or 'never'. If 'finalize', blocks will 
be flushed "
 :   "when writing is finished. If 'close', blocks will 
be 
> Rewrite:
Done


http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

Line 1410: 
> Okay, but why bother maintaining FlushDataAsync() as a separate method? It 
Makes sense. Updated. But still kept the per-impl methods in 
LogWritableBlock/FileWritableBlock since calling Flush() from 
LogBlockManager/FileBlockManager will result in more code if not.


http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

PS20, Line 260: onst OVERRIDE;
> finalizing it if it has not yet been finalized.
Done


PS20, Line 260: 
> "Also updates"
Done


PS20, Line 336: c
> ,
Done


PS20, Line 337: 
> the
Done


PS20, Line 434:   // Malformed records and other container inconsistencies are 
written to
  :   // 'report'. Healthy blocks are written either to 
'live_blocks' or
  :   // 'dead_blocks'. Live records are written to 
'live_block_recor
> Nit: would prefer to keep BlockCreated and BlockDeleted next to each other 
Done


PS20, Line 543:   // Offset up to which we have preallocated bytes.
  :   int64_t preallocated_offset_ = 0;
> Since all counters are now atomic, these comments are no longer interesting
Done


PS20, Line 878: at as a d
> "data" (to be consistent with "Syncing metadata file")
Done


PS20, Line 882: 
> Replace with "is already made durable".
Done


PS20, Line 883: 
> const auto*
Done


PS20, Line 890: if (mode == SYNC) 
> I don't think this one needs to be conditioned on SYNC.
Done


Line 891: 
> Would be nice if we could DCHECK that if blocks.size() > 1, all of the bloc
Done


PS20, Line 1040: preallocated_offset_ = off + len;
   :   }
   : 
> This needs to be updated.
Done


Line 1068:   DCHECK_GE(block_offset, 0);
> Don't need the if statement anymore.
Done


PS20, Line 1071:   // means accounting for the new block should be as simple as 
adding its
   :   // aligned length to 'next
> Let's defer this to BlockCreated(). Then RoundUpContainer can be renamed to
Done


PS20, Line 1206: etrics
> Nit: { this }
Done


Line 1216: }
> Nit: { this }
Done


PS20, Line 1289: }
   : 
   : Status LogWritableBl
> Rewrite: "We do not mark the container as read-only if FlushDataAsync() fai
Done


Line 1317: }
> 'lbm' is only used once; don't bother storing it in a local variable.
Done


PS20, Line 1319: state() const {
> Just use block_length_ here; it's more clear.
Done


PS20, Line 1319: te L
> and block_id_ here.
Done


PS20, Line 1751: ol == "close") {
> Nit: 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd
Gerrit-PatchSet: 21
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] disk failure: don't open tablets on failed disks

2017-08-21 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7766

to look at the new patch set (#3).

Change subject: disk failure: don't open tablets on failed disks
..

disk failure: don't open tablets on failed disks

Currently Kudu servers open the FS layout with failed disks. However,
the moment tablets attempt to bootstrap (i.e. open blocks, etc.), they
will attempt to read from the failed disk and fail. This can be avoided
by checking whether a tablet's disk group contains a failed disk before
attempting to read data from the tablet. If so, the tablet should be
marked as having an error so it can be reassigned.

The default behavior of the 'fs_target_data_dirs_per_tablet' flag is
updated to take into account disk state when assigning new directory
groups. This allows the tablet to be reassigned to a server without
being spread across a failed directory.

Testing is done by loading data into a cluster with multi-disk
servers, failing a single directory of one of the servers, and ensuring
that the tablets spread across the failed disk get replicated upon the
next startup.

Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
---
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tserver/ts_tablet_manager.cc
7 files changed, 222 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/7766/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7766
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [build-support] IWYU build configuration for Jenkins

2017-08-21 Thread Alexey Serbin (Code Review)
Hello Dan Burkert, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7750

to look at the new patch set (#7).

Change subject: [build-support] IWYU build configuration for Jenkins
..

[build-support] IWYU build configuration for Jenkins

Added provisions to run the include-what-you-use tool against the
updated files in the changelist.  Those are based on the newly
introduced 'iwyu' target to be used like already existing 'lint'
target.

The 'iwyu' target runs IWYU against the compilation database.  So,
as a part of this changelist I updated the top-level CMakeLists.txt
to always generate the compilation database.  The compilation database
is a JSON file containing information on how the targets are built.

Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e
---
M CMakeLists.txt
A build-support/iwyu/iwyu.sh
A build-support/iwyu/iwyu_tool.py
M build-support/jenkins/build-and-test.sh
4 files changed, 360 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/7750/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7750
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [build-support] IWYU build configuration for Jenkins

2017-08-21 Thread Alexey Serbin (Code Review)
Hello Dan Burkert, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7750

to look at the new patch set (#6).

Change subject: [build-support] IWYU build configuration for Jenkins
..

[build-support] IWYU build configuration for Jenkins

Added provisions to run the include-what-you-use tool against the
updated files in the changelist.  Those are based on the newly
introduced 'iwyu' target to be used like already existing 'lint'
target.

The 'iwyu' target run IWYU against the compilation database.  So,
as a part of this changelist I updated the top-level CMakeLists.txt
to always generate the compilation database.  The compilation database
is a JSON file containing information on how the targets are built.

Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e
---
M CMakeLists.txt
A build-support/iwyu/iwyu.sh
A build-support/iwyu/iwyu_tool.py
M build-support/jenkins/build-and-test.sh
4 files changed, 360 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/7750/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7750
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] disk failure: add persistent disk states

2017-08-21 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7270

to look at the new patch set (#24).

Change subject: disk failure: add persistent disk states
..

disk failure: add persistent disk states

This patch adds a new version of the path set so failed disks are not
used when a server is run. A previously failed disk may continue to
produce IOErrors after restart, but we may still like to start the
server without the disk.

To accomplish this, disk states, path names, and a timestamp are added
to the path set. If any disks are failed when opening the initial FS
layout, an 'unhealthy' instance with no path set metadata will be
created instead of failing the startup process.

CheckIntegrity() is updated to accomodate this. Rather than comparing
all instances against an agreed-upon set of UUIDs, the single path set
with the largest timestamp is used as the main one to compare against
(if only old path sets are available, the first healthy one is used). If
no instances are healthy, CheckIntegrity() will fail, as all disks are
failed.

Testing is done in a new iteration of CheckIntegrity(). Further testing
is done in data_dirs-test, fs_manager-test, and log_block_manager-test
to ensure the FS layout can be loaded with a failed directory.

Some notes:
- Disk failures during FS layout creation are not tolerated. In these
  cases, there is presumably no data on the server anyway, so operators
  should easily be able to fix their cluster.
- In the case of a server restart where all healthy disks fail to start
  up and some known failed disks start working again, the server will
  successfully start up with the bad disks (may have partially-written
  data).
- If there are any unhealthy instances when upgrading the path sets (i.e.
  adding disk states, paths, timestamp), a complete mapping of UUIDs to
  paths will not be available, and CheckIntegrity() will fail.
- The main path set's disk states are updated to reflect the failure of
  the instances. Since data on a failed disk cannot be trusted, disks
  that are successfully read from but are already marked FAILED by the
  main path set are not marked HEALTHY.

This patch is a part of a series of patches to handle disk failures. To
see how this fits, see 2.6 in:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f
---
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
12 files changed, 1,207 insertions(+), 278 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/24
-- 
To view, visit http://gerrit.cloudera.org:8080/7270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f
Gerrit-PatchSet: 24
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] rpc: periodic timers

2017-08-21 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: rpc: periodic timers
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.cc
File src/kudu/rpc/periodic.cc:

Line 38: DEFINE_double(periodic_period_jitter_pct, 0.25,
> Even as an experimental flag? I couldn't see any reason to use different va
See my reply to the other comment about what I would do to make this go away.  
The why is that the name of the flag is bad, and it unnecessarily ties together 
two otherwise unrelated periods (and however many more get consolidated into 
this abstraction).


http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.h
File src/kudu/rpc/periodic.h:

PS2, Line 59: periodic_period_jitter_pct
> As I see it, we have three options:
I think the interface would be cleaner as

  // Creates a new PeriodicTimer.
  //
  // A ref is taken on 'messenger', which is used for scheduling callbacks.
  //
  // 'functor' defines the user's task and is owned for the lifetime of the
  // PeriodicTimer.
  //
  // 'jitter_pct' controls the amount of jitter to introduce ... explain how 
it's [period - period * (jitter_pct / 100), period + period * (jitter_pct / 
100)) ...
  static std::shared_ptr Create(
  std::shared_ptr messenger,
  RunTaskFunctor functor,
  MonoDelta period,
  int jitter_pct = 25);

There are already flags in the failure detector that are meant to be the 
equivalent of setting the jitter here, but they are kind of vestigial at this 
point.


-- 
To view, visit http://gerrit.cloudera.org:8080/7733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers

2017-08-21 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: consensus_peers: replace bespoke Raft heartbeat logic with 
periodic timers
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7734/2//COMMIT_MSG
Commit Message:

Line 15: heartbeat load, I think it makes sense to do it.
> perhaps worth re-running the experiment from https://docs.google.com/docume
Sure, I'll do that post-commit.


http://gerrit.cloudera.org:8080/#/c/7734/2/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

Line 140: 
> I think std::function has an implicit constructor from any callable (seriou
Thanks, fixed.


-- 
To view, visit http://gerrit.cloudera.org:8080/7734
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] rpc: periodic timers

2017-08-21 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: rpc: periodic timers
..


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic-test.cc
File src/kudu/rpc/periodic-test.cc:

PS2, Line 51: ()
> The parens are optional when empty (just like the return value).  Don't fee
I don't mind changing it.


Line 77:   ASSERT_GT(counter_, 0);
> is this going to be flaky? maybe an AssertEventually is more robust?
Sure, will add an AssertEventually.


Line 88: TEST_P(PeriodicTimerTest, TestReset) {
> have you looped this with tsan? I'm worried it will be flaky if this main t
When I looped 1000 times with TSAN all I got were data races on access to 
counter_ after the test fixture had been destroyed. Explicit 
Messenger::Shutdown() in TearDown() ought to clear that up.

In general I have no idea how to write unit tests for timers that aren't 
timing-dependent in some way. I'd love more feedback on this.


http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.cc
File src/kudu/rpc/periodic.cc:

Line 38: DEFINE_double(periodic_period_jitter_pct, 0.25,
> I don't think exposing this is a good idea.  It's not clear what it's actua
Even as an experimental flag? I couldn't see any reason to use different values 
 for heartbeat and failure detection jitter, so I started off with a constant, 
then reasoned an experimental gflag is no worse than a constant (and may be 
better).


PS2, Line 54: messenger, functor
> I'm surprised clang-tidy dind't complain about these, I'd have expected pas
That is odd, but I will std::move() them.


PS2, Line 133:  // We must schedule the next callback with the minimal period 
as the delay
 :   // so that if Reset() is called with a low value, the 
scheduled callback
 :   // can still honor it.
 :   
> I don't quite follow this. If we called Reset() with a small delay (less th
As we discussed on Slack, if we allowed Reset() to schedule a new callback 
(while an existing one was in flight), we could wind up with multiple 
never-ending callback "loops". To prevent that, we force each timer to stick to 
just one callback loop, with the downside that Reset()'s delay may not be 
honored if it drops below GetMinimumPeriod().

I'll update the comment and/or find another way to express this.


http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.h
File src/kudu/rpc/periodic.h:

PS2, Line 40: Messenger::ScheduleOnReactor
> I think it's worth a note here that, since the tasks are run on reactor thr
Yes, I meant to add that but forgot; thank you for the reminder.


PS2, Line 59: periodic_period_jitter_pct
> hrm, I don't know about making this a general flag vs passing it in to the 
As I see it, we have three options:
1. Force callers to provide it.
2. Define a constant in the code.
3. Define a gflag.

I think #1 isn't interesting: certainly for the two callers I have so far 
there's no reason NOT to use the same percentage. So originally I did #2, then 
decided that #3, as long as the flag was tagged as experimental (which it is), 
is no worse.


PS2, Line 83: period for the next
:   // task. 
> not 100% clear here. Is this setting the period (overriding what's in the '
1. It's an 'initial delay'; it doesn't change the future period.
2. It'll be an exact delay, no jitter.

I'll reword to make this more clear.


Line 92:   // If 'next_task_delta' is provided, it is used as the new period. 
Otherwise,
> Similar to the above, not sure whether this means to reset the period for a
It affects just the next call.

Will rename to Snooze.


Line 93:   // the period is generated in accordance with the timer's period and 
jitter
> in the case that it is boost::none, this still will "snooze" one full perio
Yes, but it's not additive: if I call Snooze() at time X and then again at time 
X+0.1P, the task will run at task X+1.1P, not X+2P.


Line 97:   void Reset(boost::optional next_task_delta = boost::none);
> May want to add a note that next_task_delta must be > GetMinimumPeriod(), o
Will do.


-- 
To view, visit http://gerrit.cloudera.org:8080/7733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] consensus: use periodic timers for failure detection

2017-08-21 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: consensus: use periodic timers for failure detection
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 252:   std::bind([w]() {
> same comment about no-arg bind
Done


Line 319: rng_.Normal(GetFailureMonitorCheckMeanMs(),
> This might result in a negative value.  It would probably be OK, but since 
I tried to preserve these semantics to minimize the scope of the patch. And 
these flags are being used by a couple itests that I didn't really have 
interest in updating.

A negative value here just means the timer will fire immediately, which is safe.


Line 533:   "Failed to submit failure detected task");
> can you work the log prefix into this?
Done


http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

Line 27: #include 
> This looks like it may be a bad interaction with Alexey's patch, but I don'
I rebased on top of Alexey's patch and now I need neither of these. Strange.


Line 523:   void EnsureFailureDetectorEnabled(
> I think these methods would be better named 'EnableFailureDetector' and 'Di
Sure, I don't mind.


PS2, Line 529: unregistered
> disabled
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7735
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] consensus: use periodic timers for failure detection

2017-08-21 Thread Adar Dembo (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7735

to look at the new patch set (#3).

Change subject: consensus: use periodic timers for failure detection
..

consensus: use periodic timers for failure detection

This patch replaces the existing failure detection (FD) with a new approach
built using periodic timers. The existing approach had a major drawback:
each failure monitor required a dedicated thread, and there was a monitor
for each replica.

The new approach "schedules" a failure into the future using the server's
reactor thread pool, "resetting" it when leader activity is detected.
There's an inherent semantic mismatch between dedicated threads that
periodically wake to check for failures and this new approach; I tried to
provide similar semantics as best I could.

Things worth noting:
- Most importantly: some FD periods are now shorter. This is because the
  existing implementation "double counted" failure periods when adding
  backoff (once in LeaderElectionExpBackoffDeltaUnlocked, and once by virtue
  of the failure period comparison made by the failure monitor). This seemed
  accidental to me, so I didn't bother preserving that behavior.
- It's tough to "expire" an FD using timers. Luckily, this only happens in
  RaftConsensus::Start, so by making PeriodicTimer::Start accept an optional
  delta, we can begin FD with an early delta that reflects the desired
  "detect a failure immediately but not too quickly" semantic, similar to
  how the dedicated failure monitor thread operates.
- ReportFailureDetected is now run on a shared reactor thread rather than a
  dedicated failure monitor thread. Since StartElection performs IO, I
  thunked it onto the Raft thread pool.
- Timer operations cannot fail, so I removed the return values from the
  various FD-related functions.
- I also consolidated the two SnoozeFailureDetector variants; I found that
  this made it easier to look at all the call-sites.

Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 115 insertions(+), 139 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/7735/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7735
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] rpc: periodic timers

2017-08-21 Thread Adar Dembo (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7733

to look at the new patch set (#3).

Change subject: rpc: periodic timers
..

rpc: periodic timers

This patch introduces a generic periodic timer class. How does it work?
1. A timer is constructed with a user-provided task functor.
2. After Start() is called, the timer will run the functor repeatedly on a
   fixed (and optionally, jittered) period.
3. When Reset() is called, it'll reset the period.
4. After Stop() is called, it won't run the functor anymore.

The implementation is based on Messenger::ScheduleOnReactor but it could
just as easily build on libev directly. I chose to use the Messenger so that
I wouldn't have to implement timer thread pooling. It's also just a generic
version of the Raft heartbeat logic found in Peer (see commit 1070e76).

In follow-on patches I will use this class to replace the bespoke Peer logic
as well as for Raft failure detection. If you're curious, it's actually
because of failure detection that both Start() and Reset() optionally accept
deltas to override the default period; the snoozing code likes to provide
heavily customized backoff.

Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86
---
M src/kudu/rpc/CMakeLists.txt
A src/kudu/rpc/periodic-test.cc
A src/kudu/rpc/periodic.cc
A src/kudu/rpc/periodic.h
4 files changed, 491 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/7733/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] util: remove old failure detector and resettable heartbeater code

2017-08-21 Thread Adar Dembo (Code Review)
Hello Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7736

to look at the new patch set (#3).

Change subject: util: remove old failure detector and resettable heartbeater 
code
..

util: remove old failure detector and resettable heartbeater code

With the move to periodic timers, these classes are no longer needed.

Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e
---
M build-support/iwyu/iwyu-filter.awk
M src/kudu/util/CMakeLists.txt
D src/kudu/util/failure_detector-test.cc
D src/kudu/util/failure_detector.cc
D src/kudu/util/failure_detector.h
D src/kudu/util/resettable_heartbeater-test.cc
D src/kudu/util/resettable_heartbeater.cc
D src/kudu/util/resettable_heartbeater.h
8 files changed, 0 insertions(+), 891 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/7736/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers

2017-08-21 Thread Adar Dembo (Code Review)
Hello Todd Lipcon,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7734

to look at the new patch set (#3).

Change subject: consensus_peers: replace bespoke Raft heartbeat logic with 
periodic timers
..

consensus_peers: replace bespoke Raft heartbeat logic with periodic timers

Building on the generic periodic timer implementation, this patch replaces
the one-off Raft heartbeating code with periodic timers.

There's only one semantic change, but it's an important one: the jittering
range has changed from [P/2, P] to [3P/4, 5P/4]. When I wrote commit 1070e76
I was nervous about making such a change, but since it reduces overall
heartbeat load, I think it makes sense to do it.

Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
2 files changed, 24 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/7734/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7734
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
..


Patch Set 2: Code-Review-1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS2, Line 578: inflightMessages.put
uh-oh: this needs to be guarded either by lock or the collection should provide 
concurrency guarantees.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] disk failure: don't open tablets on failed disks

2017-08-21 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: disk failure: don't open tablets on failed disks
..


Patch Set 2:

This is currently dependent on https://gerrit.cloudera.org/#/c/7440/, which 
will change given some of the work Mike's been doing.

I don't think those changes will affect the contents of this patch, and I would 
still appreciate comments here.

-- 
To view, visit http://gerrit.cloudera.org:8080/7766
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] disk failure: don't open tablets on failed disks

2017-08-21 Thread Andrew Wong (Code Review)
Andrew Wong has uploaded a new patch set (#2).

Change subject: disk failure: don't open tablets on failed disks
..

disk failure: don't open tablets on failed disks

Currently Kudu servers open the FS layout with failed disks. However,
the moment tablets attempt to bootstrap (i.e. open blocks, etc.), they
will attempt to read from the failed disk and fail. This can be avoided
by checking whether a tablet's disk group contains a failed disk before
attempting to read data from the tablet. If so, the tablet should be
marked as having an error so it can be reassigned.

The default behavior of the 'fs_target_data_dirs_per_tablet' flag is
updated to take into account disk state when assigning new directory
groups. This allows the tablet to be reassigned to a server without
being spread across a failed directory.

Testing is done by loading data into a cluster with multi-disk
servers, failing a single directory of one of the servers, and ensuring
that the tablets spread across the failed disk get replicated upon the
next startup.

Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
---
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tserver/ts_tablet_manager.cc
7 files changed, 247 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/7766/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7766
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] disk failure: add persistent disk states

2017-08-21 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7270

to look at the new patch set (#23).

Change subject: disk failure: add persistent disk states
..

disk failure: add persistent disk states

This patch adds a new version of the path set so failed disks are not
used when a server is run. A previously failed disk may continue to
produce IOErrors after restart, but we may still like to start the
server without the disk.

To accomplish this, disk states, path names, and a timestamp are added
to the path set. If any disks are failed when opening the initial FS
layout, an 'unhealthy' instance with no path set metadata will be
created instead of failing the startup process.

CheckIntegrity() is updated to accomodate this. Rather than comparing
all instances against an agreed-upon set of UUIDs, the single path set
with the largest timestamp is used as the main one to compare against
(if only old path sets are available, the first healthy one is used). If
no instances are healthy, CheckIntegrity() will fail, as all disks are
failed.

Testing is done in a new iteration of CheckIntegrity(). Further testing
is done in data_dirs-test, fs_manager-test, and log_block_manager-test
to ensure the FS layout can be loaded with a failed directory.

Some notes:
- Disk failures during FS layout creation are not tolerated. In these
  cases, there is presumably no data on the server anyway, so operators
  should easily be able to fix their cluster.
- In the case of a server restart where all healthy disks fail to start
  up and some known failed disks start working again, the server will
  successfully start up with the bad disks (may have partially-written
  data).
- If there are any unhealthy instances when upgrading the path sets (i.e.
  adding disk states, paths, timestamp), a complete mapping of UUIDs to
  paths will not be available, and CheckIntegrity() will fail.
- The main path set's disk states are updated to reflect the failure of
  the instances. Since data on a failed disk cannot be trusted, disks
  that are successfully read from but are already marked FAILED by the
  main path set are not marked HEALTHY.

This patch is a part of a series of patches to handle disk failures. To
see how this fits, see 2.6 in:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f
---
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
12 files changed, 1,196 insertions(+), 278 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/23
-- 
To view, visit http://gerrit.cloudera.org:8080/7270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f
Gerrit-PatchSet: 23
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock

2017-08-21 Thread Mike Percy (Code Review)
Mike Percy has submitted this change and it was merged.

Change subject: consensus: Tablet copy should clear last-logged opid from 
superblock
..


consensus: Tablet copy should clear last-logged opid from superblock

Keeping around irrelevant information is bad hygiene.

Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea
Reviewed-on: http://gerrit.cloudera.org:8080/7718
Reviewed-by: Todd Lipcon 
Tested-by: Mike Percy 
---
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
3 files changed, 66 insertions(+), 0 deletions(-)

Approvals:
  Mike Percy: Verified
  Todd Lipcon: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock

2017-08-21 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: consensus: Tablet copy should clear last-logged opid from 
superblock
..


Patch Set 5: Verified+1

The failure in this build was caused by a host error, as seen in the stderr log 
from dist-test. I looped maintenance_manager-test to be sure and it's not flaky 
at all, even under TSAN with stress.

-- 
To view, visit http://gerrit.cloudera.org:8080/7718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] consensus: Improve contract for API to fetch last-logged op id

2017-08-21 Thread Mike Percy (Code Review)
Mike Percy has submitted this change and it was merged.

Change subject: consensus: Improve contract for API to fetch last-logged op id
..


consensus: Improve contract for API to fetch last-logged op id

It's important that we differentiate between when we have a known
last-logged op and when we don't actually know what it is or whether we
have ever appended something to the local WAL. This applies both to the
TABLET_DATA_READY case, where this information is stored in the WAL, and
the TABLET_DATA_TOMBSTONED case, where this information is stored in the
superblock.

Cases where we are unable to determine the last-logged OpId from the WAL
when a replica is in TABLET_DATA_READY state:

* Early in the tablet replica lifecycle (before Raft is started).
* When a replica encounters an error during initialization.

Cases where we are unable to determine the last-logged OpId from the
TabletMetadata when a replica is in TABLET_DATA_TOMBSTONED state:

* If the replica was tombstoned while in a failed state.

Included in this patch are the following API improvements:

1. Delete Log::GetLatestEntryOpId(). Previously, this method would only
   return something other than MinimumOpId() if a log entry had been
   appended during the object's lifetime. It is abandoned in favor of
   RaftConsensus::GetLastOpId(RECEIVED_OPID) which delegates to
   PeerMessageQueue::GetLastOpIdInLog().
2. Merge PeerMessageQueue::Init() into the PeerMessageQueue constructor.
   This allows us to remove one lifecycle state and allows us to
   guarantee that, once the queue is constructed, we can always get a
   valid last-logged opid from it (see #1).
3. Make TabletMetadata::tombstone_last_logged_opid() return a
   boost::optional. We need to clearly differentiate between when
   we know the last-logged opid and when we don't. We also consider
   MinimumOpId() to be equal to boost::none at superblock load time,
   since previous versions of Kudu may have written (0,0) into the
   TabletMetadata 'tombstone_last_logged_opid' field.

Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7
Reviewed-on: http://gerrit.cloudera.org:8080/7717
Reviewed-by: Todd Lipcon 
Tested-by: Kudu Jenkins
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/mt-log-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_source_session.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
19 files changed, 244 insertions(+), 242 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7717
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded a new patch set (#2).

Change subject: KUDU-1894 fixed deadlock in client.Connection
..

KUDU-1894 fixed deadlock in client.Connection

Due to the reverse order of acquiring the Connection.lock and
lower-level Netty locks, Connection.enqueueMesage() could deadlock if
a ChannelDisconnected/ChannelClosed event arrived in the middle.

Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
1 file changed, 11 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/7765/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7765/1//COMMIT_MSG
Commit Message:

Line 13: This changelist addresses KUDU-1894.
> Curious why you specified this here and didn't just prefix the commit summa
That's because KUDU-1894 was about some different issue in the beginning.

>From the other side, it's better to stick to current interpretation: KUDU-1894 
>contains the exact information about  the deadlock this patch fixes.


http://gerrit.cloudera.org:8080/#/c/7765/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS1, Line 485: Netty's
> Netty (here and below)
Done


PS1, Line 491: if
> of a
Done


Line 573:   @GuardedBy("lock")
> The annotation should be removed, right?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [java] fixed deadlock in client.Connection

2017-08-21 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [java] fixed deadlock in client.Connection
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/1//COMMIT_MSG
Commit Message:

Line 13: This changelist addresses KUDU-1894.
Curious why you specified this here and didn't just prefix the commit summary 
with the bug reference?


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [java] fixed deadlock in client.Connection

2017-08-21 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: [java] fixed deadlock in client.Connection
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7765/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS1, Line 485: Netty's
Netty (here and below)


PS1, Line 491: if
of a


Line 573:   @GuardedBy("lock")
The annotation should be removed, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1407: reassign failed tablets

2017-08-21 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: KUDU-1407: reassign failed tablets
..


Patch Set 15:

Just rebased this in my series of start-time failure-handling patches, since 
it's necessary for https://gerrit.cloudera.org/#/c/7766/ to reassign tablets 
that fail to bootstrap.

Currently still going through the tombstoned voting patch to see how this needs 
to be updated.

-- 
To view, visit http://gerrit.cloudera.org:8080/7440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] avoid opening tablets on failed disks

2017-08-21 Thread Andrew Wong (Code Review)
Andrew Wong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7766

Change subject: avoid opening tablets on failed disks
..

avoid opening tablets on failed disks

Currently Kudu servers can start up with failed disks. However, the
moment tablets attempt to bootstrap (i.e. open blocks, etc.), they will
attempt to read from the failed disk and fail. This can be avoided by
checking whether a tablet's disk group contains a failed disk before
attempting to open the tablet. If so, the tablet should be marked as
having an error so it can be reassigned.

The default behavior of the 'fs_target_data_dirs_per_tablet' flag is
updated to take into account disk state when assigning new directory
groups. This allows the tablet to be reassigned to a server without
being spread across a failed directory.

Testing is done by loading data into a cluster with multi-disk
servers, failing a single directory of one of the servers, and ensuring
that the tablets spread across the failed disk get replicated upon the
next startup.

Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
---
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/disk_failure-itest.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
9 files changed, 255 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/7766/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7766
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 


[kudu-CR] [java] fixed deadlock in client.Connection

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7765

Change subject: [java] fixed deadlock in client.Connection
..

[java] fixed deadlock in client.Connection

Due to the reverse order of acquiring the Connection.lock and
lower-level Netty locks, Connection.enqueueMesage() could deadlock if
a ChannelDisconnected/ChannelClosed event arrived in the middle.

This changelist addresses KUDU-1894.

Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
1 file changed, 11 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/7765/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 


[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2032 (part 2): propagate master hostnames into client
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7692
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] [java] Include Spark/Scala base version in kudu-spark-tools

2017-08-21 Thread Dan Burkert (Code Review)
Dan Burkert has submitted this change and it was merged.

Change subject: [java] Include Spark/Scala base version in kudu-spark-tools
..


[java] Include Spark/Scala base version in kudu-spark-tools

Renames the kudu-spark-tools artifact to include the
spark and scala base version. Including the scala
version is standard practice and helps users avoid binary
compatability issues. Adding the Spark base version matches
the pattern used in the kudu-spark module and prevents users
from trying to use kudu-spark-tools with Spark 1. This change
will also make upgrades to Scala 2.12 or Spark 3 in the future
more seamless.

Before: kudu-spark-tools
After:kudu-spark2-tools_2.11

Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d
Reviewed-on: http://gerrit.cloudera.org:8080/7723
Tested-by: Kudu Jenkins
Reviewed-by: Jean-Daniel Cryans 
---
M docs/release_notes.adoc
M java/kudu-spark-tools/build.gradle
M java/kudu-spark-tools/pom.xml
3 files changed, 10 insertions(+), 2 deletions(-)

Approvals:
  Jean-Daniel Cryans: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] [java] Include Spark/Scala base version in kudu-spark-tools

2017-08-21 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change.

Change subject: [java] Include Spark/Scala base version in kudu-spark-tools
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR](branch-1.4.x) KUDU-1942. Kerberos fails to log in on hostnames with capital letters

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7763

Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital 
letters
..

KUDU-1942. Kerberos fails to log in on hostnames with capital letters

This ensures that servers canonicalize their FQDNs to lower-case before
generating Kerberos principal names.

With this change I was able to set up a working cluster on my laptop
with a capitalized hostname, where before it would fail as described in
the JIRA.

I also verified that I was able to connect from both C++ and Java
clients.

Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627
Reviewed-on: http://gerrit.cloudera.org:8080/7693
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
(cherry picked from commit ecfcb312161a4bc6f8da0ebb837169a2acf08eef)
---
M src/kudu/security/init.cc
1 file changed, 3 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/7763/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7763
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-Owner: Todd Lipcon 


[kudu-CR](branch-1.3.x) KUDU-1942. Kerberos fails to log in on hostnames with capital letters

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7764

Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital 
letters
..

KUDU-1942. Kerberos fails to log in on hostnames with capital letters

This ensures that servers canonicalize their FQDNs to lower-case before
generating Kerberos principal names.

With this change I was able to set up a working cluster on my laptop
with a capitalized hostname, where before it would fail as described in
the JIRA.

I also verified that I was able to connect from both C++ and Java
clients.

Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627
Reviewed-on: http://gerrit.cloudera.org:8080/7693
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
(cherry picked from commit ecfcb312161a4bc6f8da0ebb837169a2acf08eef)
---
M src/kudu/security/init.cc
1 file changed, 3 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/7764/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7764
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-1.3.x
Gerrit-Owner: Todd Lipcon 


[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client

2017-08-21 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7692

to look at the new patch set (#8).

Change subject: KUDU-2032 (part 2): propagate master hostnames into client
..

KUDU-2032 (part 2): propagate master hostnames into client

This changes the client code to remember the user-specified master
addresses and propagate them into the creation of master proxies.

It's not possible to reproduce the necessary DNS configurations in a
minicluster test, but with this patch I am now able to use 'kudu perf
loadgen' against a Kerberized cluster even when my local krb5.conf has
rdns=false.

Change-Id: Ie5838f22c96f757124112b505825a53740468ce1
---
M src/kudu/client/client-internal.cc
M src/kudu/client/client-internal.h
M src/kudu/client/master_rpc.cc
M src/kudu/client/master_rpc.h
M src/kudu/integration-tests/external_mini_cluster.cc
5 files changed, 56 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/7692/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7692
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: consensus: Tablet copy should clear last-logged opid from 
superblock
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] consensus: Improve contract for API to fetch last-logged op id

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: consensus: Improve contract for API to fetch last-logged op id
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7717
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] consensus: Improve contract for API to fetch last-logged op id

2017-08-21 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: consensus: Improve contract for API to fetch last-logged op id
..


Patch Set 4:

I had to manually rebase on top of Alexey's IWYU patch. The conflicts were very 
minimal.

-- 
To view, visit http://gerrit.cloudera.org:8080/7717
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] consensus: Improve contract for API to fetch last-logged op id

2017-08-21 Thread Mike Percy (Code Review)
Hello Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7717

to look at the new patch set (#4).

Change subject: consensus: Improve contract for API to fetch last-logged op id
..

consensus: Improve contract for API to fetch last-logged op id

It's important that we differentiate between when we have a known
last-logged op and when we don't actually know what it is or whether we
have ever appended something to the local WAL. This applies both to the
TABLET_DATA_READY case, where this information is stored in the WAL, and
the TABLET_DATA_TOMBSTONED case, where this information is stored in the
superblock.

Cases where we are unable to determine the last-logged OpId from the WAL
when a replica is in TABLET_DATA_READY state:

* Early in the tablet replica lifecycle (before Raft is started).
* When a replica encounters an error during initialization.

Cases where we are unable to determine the last-logged OpId from the
TabletMetadata when a replica is in TABLET_DATA_TOMBSTONED state:

* If the replica was tombstoned while in a failed state.

Included in this patch are the following API improvements:

1. Delete Log::GetLatestEntryOpId(). Previously, this method would only
   return something other than MinimumOpId() if a log entry had been
   appended during the object's lifetime. It is abandoned in favor of
   RaftConsensus::GetLastOpId(RECEIVED_OPID) which delegates to
   PeerMessageQueue::GetLastOpIdInLog().
2. Merge PeerMessageQueue::Init() into the PeerMessageQueue constructor.
   This allows us to remove one lifecycle state and allows us to
   guarantee that, once the queue is constructed, we can always get a
   valid last-logged opid from it (see #1).
3. Make TabletMetadata::tombstone_last_logged_opid() return a
   boost::optional. We need to clearly differentiate between when
   we know the last-logged opid and when we don't. We also consider
   MinimumOpId() to be equal to boost::none at superblock load time,
   since previous versions of Kudu may have written (0,0) into the
   TabletMetadata 'tombstone_last_logged_opid' field.

Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/mt-log-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_source_session.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
19 files changed, 244 insertions(+), 242 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/7717/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7717
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-871. Support tombstoned voting

2017-08-21 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6960

to look at the new patch set (#10).

Change subject: KUDU-871. Support tombstoned voting
..

KUDU-871. Support tombstoned voting

This patch makes it possible for tombstoned tablet replicas to vote in
Raft elections.

Changes:

* Add Stop() method to TabletReplica + Consensus lifecycle.
  * Includes new STOPPED state.
  * Tombstoning a replica should call Stop().
  * Deleting a replica should call Shutdown().
* Add positive and negative tests for tombstoned voting.
* Add a stress test that induces lots of tombstoned voting
  while running TabletCopy, TabletBootstrap, and DeleteTablet.
* Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned
  voting changed its assumption that tombstoned tablets would not vote.
* Fix several tests that expected tombstoned tablets to be SHUTDOWN when
  now they are STOPPED.
* Get rid of tablet::FAILED state, instead go to SHUTDOWN with an error.

Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/delete_table-itest.cc
M src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
A src/kudu/integration-tests/tombstoned_voting-itest.cc
A src/kudu/integration-tests/tombstoned_voting-stress-test.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-ts-cli-test.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/tserver/tserver-path-handlers.cc
M src/kudu/util/make_shared.h
33 files changed, 1,439 insertions(+), 340 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: consensus: Tablet copy should clear last-logged opid from 
superblock
..


Patch Set 4: Code-Review+2

Got it, sorry for the confusion.

-- 
To view, visit http://gerrit.cloudera.org:8080/7718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock

2017-08-21 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: consensus: Tablet copy should clear last-logged opid from 
superblock
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7718/3//COMMIT_MSG
Commit Message:

Line 9: Keeping around irrelevant information is bad hygiene.
> Mike and I discussed offline... it seems like this might not be a great ide
The patch as-written only clears the last-logged opid when we transition to 
TABLET_DATA_READY, which is only synced after tablet copy is successfully 
completed.

So the scenario summarized above wouldn't be affected negatively by this 
(correctness related) patch. I definitely agree we want to preserve the 
last-logged opid as much as possible (for availability reasons) and I am 
planning on ensuring that we do that in a follow-up patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/7718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] consensus: Improve contract for API to fetch last-logged op id

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: consensus: Improve contract for API to fetch last-logged op id
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7717
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged.

Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
..


Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

This commit had some semantic conflict with the recent IWYU include
cleanup.

Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
Reviewed-on: http://gerrit.cloudera.org:8080/7762
Reviewed-by: Alexey Serbin 
Tested-by: Todd Lipcon 
---
M src/kudu/rpc/connection.h
M src/kudu/rpc/connection_id.cc
2 files changed, 2 insertions(+), 0 deletions(-)

Approvals:
  Todd Lipcon: Verified
  Alexey Serbin: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1

2017-08-21 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has uploaded a new patch set (#10).

Change subject: Kudu Consistency Blog Post Pt1
..

Kudu Consistency Blog Post Pt1

This is the first part of multi-part blog post series about
consistency in Kudu.
It's hard to talk about consistency topics without context,
particularly in blog posts where we can't assume very much about
the readers. After struggling a bit with coming up with something
that could be self contained in a single post I ended up biting
the bullet and putting my name down to write a multi-part series.

The series will cover Kudu from a consistency standpoint. The
first post (this one) is about motivation and how the write
and read paths work in general. Follow up posts will cover
specific components, edge cases and guarantees. I'll finish
with a post about overall semantics and what users can expect
in the future.

Rendering available at:
http://24.7.81.123:4000/2017/08/21/kudu-consistency-pt1.html

Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
---
A _posts/2017-08-21-kudu-consistency-pt1.md
1 file changed, 172 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7019/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] [tools] Add summary-only mode to ksck

2017-08-21 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change.

Change subject: [tools] Add summary-only mode to ksck
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7759/2/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

Line 57: "If set, ksck prints only summary information. Use with 
the --format flag to "
Does this description still make sense when also doing a checksum check?


-- 
To view, visit http://gerrit.cloudera.org:8080/7759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a961867a9e6a32ca91144843f9d1b8139f756f5
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
..


Patch Set 2: Verified+1

Pushing without waiting for tests since this unbreaks build (ensured that build 
succeeded in debug jenkins build)

-- 
To view, visit http://gerrit.cloudera.org:8080/7762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] [tools] Add summary-only mode to ksck

2017-08-21 Thread Will Berkeley (Code Review)
Will Berkeley has uploaded a new patch set (#2).

Change subject: [tools] Add summary-only mode to ksck
..

[tools] Add summary-only mode to ksck

This patch add a -summary_only flag to ksck, which suppresses
all output except for the table summary at the end. When used in
combination with the -format flag supported by the DataTable
class, it allows users to get useful machine-readable ksck
output as csv, tsv, or json.

Change-Id: I3a961867a9e6a32ca91144843f9d1b8139f756f5
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
3 files changed, 134 insertions(+), 125 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/7759/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a961867a9e6a32ca91144843f9d1b8139f756f5
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] [tools] Add summary-only mode to ksck

2017-08-21 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: [tools] Add summary-only mode to ksck
..


Patch Set 1: Verified+1

Build failure due to KUDU-1736 (certainly unrelated to this patch).

-- 
To view, visit http://gerrit.cloudera.org:8080/7759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a961867a9e6a32ca91144843f9d1b8139f756f5
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] KUDU-1811: C++ client: use larger batches when fetching scan tokens

2017-08-21 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1811: C++ client: use larger batches when fetching scan 
tokens
..


Patch Set 5:

(2 comments)

Overall I think this is a good change, but I wonder if it wouldn't just be 
simpler to change the existing constant to 100 and call it good enough.  10 
does seem like a low number in any situation given the overhead of RPCs.

http://gerrit.cloudera.org:8080/#/c/7748/5/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

Line 812: int requested_batch_size) {
Might be good to name this 'max_returned_locations' to keep it consistent.


http://gerrit.cloudera.org:8080/#/c/7748/5/src/kudu/client/meta_cache.h
File src/kudu/client/meta_cache.h:

Line 390:   void LookupTabletsByKeyOrNext(const KuduTable* table,
Instead of creating a new method, it may be cleaner to add an optional argument 
to the existing 'LookupTabletByKeyOrNext' method, with the default value of 
kFetchTabletsPerPointLookup.  Then you could expose kFetchTabletsPerRangeLookup 
as a public constant, and use it from the scan token generator.


-- 
To view, visit http://gerrit.cloudera.org:8080/7748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79340dc9963944454770d82a2fbaba1b0c8a810e
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jun He 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jun He 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

2017-08-21 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7762

to look at the new patch set (#2).

Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
..

Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

This commit had some semantic conflict with the recent IWYU include
cleanup.

Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
---
M src/kudu/rpc/connection.h
M src/kudu/rpc/connection_id.cc
2 files changed, 2 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/7762/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager

2017-08-21 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1943: Add BlockTransaction to Block Manager
..


Patch Set 20:

(30 comments)

http://gerrit.cloudera.org:8080/#/c/7207/19//COMMIT_MSG
Commit Message:

PS19, Line 22: s-per
> Done
You still need 'localhost' though; that's not an optional parameter I think. I 
was just saying you could remove the port, which is optional.


http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager-stress-test.cc
File src/kudu/fs/block_manager-stress-test.cc:

Line 282: Slice seed_slice(reinterpret_cast(&seed), 
sizeof(seed));
Should be 


PS20, Line 297: WritableBlock *block = 
dirty_blocks[next_block_idx].get();
  : Random &rand = dirty_block_rands[next_block_idx];
You changed the pointer/ref style here.


Line 310:   // Close all dirty blocks.
Update.


Line 311:   for (const auto &dirty_block : dirty_blocks) {
const auto&


Line 318:   for (auto &dirty_block : dirty_blocks) {
auto&


Line 326:   for (const auto &block : all_dirty_blocks) {
const auto&


http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/block_manager.cc
File src/kudu/fs/block_manager.cc:

PS20, Line 44:   "Control when to flush a block, either at 
'finalize', 'close',"
 :   " or 'never'. 'finalize' indicates to flush when 
writing to "
 :   "the block is finished. 'close' indicates to defer 
the flushing "
 :   "to when the entire BlockTransaction, that the 
blocks belong "
 :   "to, is committed. And 'never' is not flush at 
all.");
Rewrite:

Controls when to flush a block. Valid values are 'finalize', 'close', or 
'never'. If 'finalize', blocks will be flushed when writing is finished. If 
'close', blocks will be flushed when their transaction is committed. If 
'never', blocks will never be flushed.


http://gerrit.cloudera.org:8080/#/c/7207/17/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

Line 1410: }
> Updated. To clarify, I did this because we were saying 'Finalize indicates 
Okay, but why bother maintaining FlushDataAsync() as a separate method? It 
certainly shouldn't be part of the WritableBlock interface because there's no 
need for anyone outside the block manager to know about the distinction.

I suppose you could keep it as per-impl methods in 
LogWritableBlock/FileWritableBlock, but it's just one line of code, so the 
decomposition doesn't seem that useful.


http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

PS20, Line 260: finalize it if has not
finalizing it if it has not yet been finalized.


PS20, Line 260: And update
"Also updates"


PS20, Line 336: .
,


PS20, Line 337: th 
the


PS20, Line 434:   // Finalize a fully written block. It round up this 
container, truncate it if full
  :   // and mark the container as available.
  :   void FinalizeBlock(int64_t block_offset, int64_t 
block_length);
Nit: would prefer to keep BlockCreated and BlockDeleted next to each other 
since they're fairly symmetric.


PS20, Line 543:   //
  :   // Declared atomic because it is mutated from BlockDeleted().
Since all counters are now atomic, these comments are no longer interesting.


PS20, Line 878: container
"data" (to be consistent with "Syncing metadata file")


PS20, Line 882:  does so at any given point
Replace with "is already made durable".

Or rewrite as "Append metadata only after data is synced so that there's no 
chance of metadata landing on the disk before the data."


PS20, Line 883: const auto &
const auto*


PS20, Line 890: if (mode == SYNC) 
I don't think this one needs to be conditioned on SYNC.


Line 891: for (LogWritableBlock* block : blocks) {
Would be nice if we could DCHECK that if blocks.size() > 1, all of the blocks 
are FINALIZED. If they aren't, we have a big problem: we'll end up calling 
FinalizeBlock() on the same container multiple times from within DoClose() and 
could corrupt on-disk data via truncate calls.


PS20, Line 1040:   // Note that this take places _after_ the container has been 
synced to disk.
   :   // That's OK; truncation isn't needed for correctness, and 
in the event of a
   :   // crash or error, it will be retried at startup.
This needs to be updated.


Line 1068:   if (PREDICT_TRUE(new_next_block_offset >= next_block_offset())) {
Don't need the if statement anymore.


PS20, Line 1071:   total_bytes_ .IncrementBy(fs_aligned_length(block_offset, 
block_length));
   :   total_blocks_.Increment();
Let's defer this to BlockCreated(). Then RoundUpContainer can be renamed to the 
more clear UpdateNextBlockOffset.


PS20, Line 1206: {this}
Nit: { this }


Line 1216:   RETURN_NOT_OK(container_->DoCloseBlocks({this}, 
LogBlockContainer

[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1

2017-08-21 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has uploaded a new patch set (#9).

Change subject: Kudu Consistency Blog Post Pt1
..

Kudu Consistency Blog Post Pt1

This is the first part of multi-part blog post series about
consistency in Kudu.
It's hard to talk about consistency topics without context,
particularly in blog posts where we can't assume very much about
the readers. After struggling a bit with coming up with something
that could be self contained in a single post I ended up biting
the bullet and putting my name down to write a multi-part series.

The series will cover Kudu from a consistency standpoint. The
first post (this one) is about motivation and how the write
and read paths work in general. Follow up posts will cover
specific components, edge cases and guarantees. I'll finish
with a post about overall semantics and what users can expect
in the future.

Rendering available at:
http://dralves-server:4000/2017/08/21/kudu-consistency-pt1.html

Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
---
A _posts/2017-08-21-kudu-consistency-pt1.md
1 file changed, 172 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7019/9
-- 
To view, visit http://gerrit.cloudera.org:8080/7019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: consensus_peers: replace bespoke Raft heartbeat logic with 
periodic timers
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7734/2/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

Line 140:   std::bind([w]() {
> oh sure, if the task abstraction has a constructor for that (I haven't chec
I think std::function has an implicit constructor from any callable (serious 
template magic)


-- 
To view, visit http://gerrit.cloudera.org:8080/7734
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7762/1/src/kudu/rpc/connection_id.cc
File src/kudu/rpc/connection_id.cc:

PS1, Line 22: #include 
> I thought you decided we had to add ostream everywhere that we used logging
Yep, but that's only when there is a construct like

LOG(INFO) << "mega info";

in the file.  The  in that case is needed for the ostream& 
operator<<(ostream&).  Otherwise, IWYU will say the  should be removed.


-- 
To view, visit http://gerrit.cloudera.org:8080/7762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7762/1/src/kudu/rpc/connection_id.cc
File src/kudu/rpc/connection_id.cc:

PS1, Line 22: #include 
> I'm not sure we need this for this version.  What is this for?
I thought you decided we had to add ostream everywhere that we used logging.h? 
At least it's included in 410 of our source files :)


-- 
To view, visit http://gerrit.cloudera.org:8080/7762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1811: C++ client: use larger batches when fetching scan tokens

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1811: C++ client: use larger batches when fetching scan 
tokens
..


Patch Set 5:

(2 comments)

just a couple nits. I also added Dan who knows this code a bit better and may 
have more substantial comments.

http://gerrit.cloudera.org:8080/#/c/7748/5/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

Line 76: /**
style nit: we use // even for block comments


Line 1048: string partition_key,
nit: alignment


-- 
To view, visit http://gerrit.cloudera.org:8080/7748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79340dc9963944454770d82a2fbaba1b0c8a810e
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jun He 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jun He 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers

2017-08-21 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: consensus_peers: replace bespoke Raft heartbeat logic with 
periodic timers
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7734/2/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

Line 140:   std::bind([w]() {
> oh, good catch. shouldn't you be able to just pass the lambda directly anyw
oh sure, if the task abstraction has a constructor for that (I haven't checked).


-- 
To view, visit http://gerrit.cloudera.org:8080/7734
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7762/1/src/kudu/rpc/connection_id.cc
File src/kudu/rpc/connection_id.cc:

PS1, Line 22: #include 
I'm not sure we need this for this version.  What is this for?


-- 
To view, visit http://gerrit.cloudera.org:8080/7762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] consensus: use periodic timers for failure detection

2017-08-21 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: consensus: use periodic timers for failure detection
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 252:   std::bind([w]() {
same comment about no-arg bind


Line 319: rng_.Normal(GetFailureMonitorCheckMeanMs(),
This might result in a negative value.  It would probably be OK, but since it 
would very rarely occur in tests, it's perhaps best to bound it.

Bigger picture, it seems like overkill to keep around the 
leader_failure_monitor_check_mean_ms and leader_failure_monitor_check_stddev_ms 
flags just for this very specific case.  Perhaps remove them, and use the new 
failure detection timeperiod calculation (uniform distribution) instead?


http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

Line 27: #include 
This looks like it may be a bad interaction with Alexey's patch, but I don't 
think you should need both optional.hpp and none.hpp


Line 523:   void EnsureFailureDetectorEnabled(
I think these methods would be better named 'EnableFailureDetector' and 
'DisableFailureDetector' to match 'SnoozeFailureDetector'.  Feel free to ignore 
if you feel that's outside the scope of this patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/7735
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client

2017-08-21 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7692

to look at the new patch set (#7).

Change subject: KUDU-2032 (part 2): propagate master hostnames into client
..

KUDU-2032 (part 2): propagate master hostnames into client

This changes the client code to remember the user-specified master
addresses and propagate them into the creation of master proxies.

It's not possible to reproduce the necessary DNS configurations in a
minicluster test, but with this patch I am now able to use 'kudu perf
loadgen' against a Kerberized cluster even when my local krb5.conf has
rdns=false.

Change-Id: Ie5838f22c96f757124112b505825a53740468ce1
---
M src/kudu/client/client-internal.cc
M src/kudu/client/client-internal.h
M src/kudu/client/master_rpc.cc
M src/kudu/client/master_rpc.h
M src/kudu/integration-tests/external_mini_cluster.cc
5 files changed, 57 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/7692/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7692
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] util: remove old failure detector and resettable heartbeater code

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: util: remove old failure detector and resettable heartbeater 
code
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7736
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

2017-08-21 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/7762

to review the following change.

Change subject: Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a
..

Unbreak build after efb5395bdf8b89077b403c8a01528e63b346c49a

This commit had some semantic conflict with the recent IWYU include
cleanup.

Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
---
M src/kudu/rpc/connection.h
M src/kudu/rpc/connection_id.cc
2 files changed, 3 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/7762/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If54a615a491cc3ebf22385d5fdb5817a6c992448
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 


[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: consensus_peers: replace bespoke Raft heartbeat logic with 
periodic timers
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7734/2/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

Line 140:   std::bind([w]() {
> I think it would be more clear to create a std::function or whatever type d
oh, good catch. shouldn't you be able to just pass the lambda directly anyway, 
with no binding?


-- 
To view, visit http://gerrit.cloudera.org:8080/7734
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1

2017-08-21 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has uploaded a new patch set (#8).

Change subject: Kudu Consistency Blog Post Pt1
..

Kudu Consistency Blog Post Pt1

This is the first part of multi-part blog post series about
consistency in Kudu.
It's hard to talk about consistency topics without context,
particularly in blog posts where we can't assume very much about
the readers. After struggling a bit with coming up with something
that could be self contained in a single post I ended up biting
the bullet and putting my name down to write a multi-part series.

The series will cover Kudu from a consistency standpoint. The
first post (this one) is about motivation and how the write
and read paths work in general. Follow up posts will cover
specific components, edge cases and guarantees. I'll finish
with a post about overall semantics and what users can expect
in the future.

Rendering available at:
http://dralves-server:4000/2017/08/21/kudu-consistency-pt1.html

Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
---
A _posts/2017-08-21-kudu-consistency-pt1.md
1 file changed, 158 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7019/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2032 (part 2): propagate master hostnames into client
..


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7692
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1

2017-08-21 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has uploaded a new patch set (#7).

Change subject: Kudu Consistency Blog Post Pt1
..

Kudu Consistency Blog Post Pt1

This is the first part of multi-part blog post series about
consistency in Kudu.
It's hard to talk about consistency topics without context,
particularly in blog posts where we can't assume very much about
the readers. After struggling a bit with coming up with something
that could be self contained in a single post I ended up biting
the bullet and putting my name down to write a multi-part series.

The series will cover Kudu from a consistency standpoint. The
first post (this one) is about motivation and how the write
and read paths work in general. Follow up posts will cover
specific components, edge cases and guarantees. I'll finish
with a post about overall semantics and what users can expect
in the future.

Rendering available at:
http://dralves-server:4000/2017/08/21/kudu-consistency-pt1.html

Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
---
A _posts/2017-08-21-kudu-consistency-pt1.md
1 file changed, 157 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7019/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-2032 (part 1): pass pre-resolution hostname into RPC 
proxies
..


KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies

This modifies the constructor of RPC proxies (generated and otherwise)
to take the remote hostname in addition to the existing resolved
Sockaddr parameter. The hostname is then passed into the ConnectionId
object, and plumbed through to the SASL client in place of the IP
address that was used previously.

The patch changes all of the construction sites of Proxy to fit the new
interface. In most of the test cases, we don't have real hostnames, so
we just use the dotted-decimal string form of the remote Sockaddr, which
matches the existing behavior.

In the real call sites, we have actual host names typically specified by
the user, and in those cases we'll need to pass those into the proxy. In
a few cases, they were conveniently available in the same function that
creates the proxy. In others, they are relatively far away, so this
patch just uses the dotted-decimal string and leaves TODOs.

In the case that Kerberos is not configured, this change should have no
effect since the hostname is ignored by SASL "plain". In the case that
Kerberos is configured with 'rdns=true', they also have no effect,
because the krb5 library will resolve and reverse the hostname from the
IP as it did before. In the case that 'rdns=false', this moves us one
step closer to fixing KUDU-2032 by getting a hostname into the SASL
library.

I verified that, if I set 'rdns = false' on a Kerberized client, I'm now
able to run  'kudu master status ' successfully where it would not
before. This tool uses a direct proxy instantiation where the hostname
was easy to plumb in. 'kudu table list ' still does not work because
it uses the client, which wasn't convenient to plumb quite yet.

Given that this makes incremental improvement towards fixing the issue
without any regression, and is already a fairly wide patch, I hope to
commit this and then address the remaining plumbing in a separate patch.

Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98
Reviewed-on: http://gerrit.cloudera.org:8080/7687
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
---
M src/kudu/client/client-internal.cc
M src/kudu/client/master_rpc.cc
M src/kudu/client/meta_cache.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/exactly_once_writes-itest.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/internal_mini_cluster.cc
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/integration-tests/master_cert_authority-itest.cc
M src/kudu/integration-tests/master_replication-itest.cc
M src/kudu/integration-tests/security-itest.cc
M src/kudu/integration-tests/table_locations-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/master-test.cc
M src/kudu/master/master.cc
M src/kudu/master/sys_catalog-test.cc
M src/kudu/master/ts_descriptor.cc
M src/kudu/master/ts_descriptor.h
M src/kudu/rpc/connection.h
M src/kudu/rpc/connection_id.cc
M src/kudu/rpc/connection_id.h
M src/kudu/rpc/exactly_once_rpc-test.cc
M src/kudu/rpc/mt-rpc-test.cc
M src/kudu/rpc/negotiation.cc
M src/kudu/rpc/protoc-gen-krpc.cc
M src/kudu/rpc/proxy.cc
M src/kudu/rpc/proxy.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/rpc-bench.cc
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/rpc_stub-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tserver/heartbeater.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_service-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_server_test_util.cc
M src/kudu/util/net/net_util-test.cc
M src/kudu/util/net/sockaddr.cc
M src/kudu/util/net/sockaddr.h
42 files changed, 220 insertions(+), 139 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7687
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [build-support] IWYU build configuration for Jenkins

2017-08-21 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [build-support] IWYU build configuration for Jenkins
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7750
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] consensus: use periodic timers for failure detection

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: consensus: use periodic timers for failure detection
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 533:   "Failed to submit failure detected task");
can you work the log prefix into this?


http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS2, Line 529: unregistered
disabled


-- 
To view, visit http://gerrit.cloudera.org:8080/7735
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: Kudu Consistency Blog Post Pt1
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7019/6//COMMIT_MSG
Commit Message:

PS6, Line 25: dralves-server
nit: mind adding the domain name?  It would be nice to get the IP or FQDN here.


-- 
To view, visit http://gerrit.cloudera.org:8080/7019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] disk failure: add persistent disk states

2017-08-21 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7270

to look at the new patch set (#22).

Change subject: disk failure: add persistent disk states
..

disk failure: add persistent disk states

This patch adds a new version of the path set so failed disks are not
used when a server is run. A previously failed disk may continue to
produce IOErrors after restart, but we may still like to start the
server without the disk.

To accomplish this, disk states, path names, and a timestamp are added
to the path set. If any disks are failed when opening the initial FS
layout, an 'unhealthy' instance with no path set metadata will be
created instead of failing the startup process.

CheckIntegrity() is updated to accomodate this. Rather than comparing
all instances against an agreed-upon set of UUIDs, the single path set
with the largest timestamp is used as the main one to compare against
(if only old path sets are available, the first healthy one is used). If
no instances are healthy, CheckIntegrity() will fail, as all disks are
failed.

Testing is done in a new iteration of CheckIntegrity(). Further testing
is done in data_dirs-test, fs_manager-test, and log_block_manager-test
to ensure the FS layout can be loaded with a failed directory.

Some notes:
- Disk failures during FS layout creation are not tolerated. In these
  cases, there is presumably no data on the server anyway, so operators
  should easily be able to fix their cluster.
- In the case of a server restart where all healthy disks fail to start
  up and some known failed disks start working again, the server will
  successfully start up with the bad disks (may have partially-written
  data).
- If there are any unhealthy instances when upgrading the path sets (i.e.
  adding disk states, paths, timestamp), a complete mapping of UUIDs to
  paths will not be available, and CheckIntegrity() will fail.
- The main path set's disk states are updated to reflect the failure of
  the instances. Since data on a failed disk cannot be trusted, disks
  that are successfully read from but are already marked FAILED by the
  main path set are not marked HEALTHY.

This patch is a part of a series of patches to handle disk failures. To
see how this fits, see 2.6 in:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f
---
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs.proto
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
12 files changed, 1,187 insertions(+), 278 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/22
-- 
To view, visit http://gerrit.cloudera.org:8080/7270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f
Gerrit-PatchSet: 22
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers

2017-08-21 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: consensus_peers: replace bespoke Raft heartbeat logic with 
periodic timers
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7734/2/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

Line 140:   std::bind([w]() {
I think it would be more clear to create a std::function or whatever type 
directly instead of an indirectly through a no arg bind.


-- 
To view, visit http://gerrit.cloudera.org:8080/7734
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2032 (part 1): pass pre-resolution hostname into RPC 
proxies
..


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7687
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: consensus_peers: replace bespoke Raft heartbeat logic with 
periodic timers
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7734/2//COMMIT_MSG
Commit Message:

Line 15: heartbeat load, I think it makes sense to do it.
perhaps worth re-running the experiment from 
https://docs.google.com/document/d/1066W63e2YUTNnecmfRwgAHghBPnL1Pte_gJYAaZ_Bjo/edit#
 to make sure the new jitter parameters don't regress the stabilization after a 
failure. I think it's fine to do that after commit, though, since you did the 
small-scale experiment that you commented on in this gerrit.


-- 
To view, visit http://gerrit.cloudera.org:8080/7734
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [build-support] IWYU build configuration for Jenkins

2017-08-21 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: [build-support] IWYU build configuration for Jenkins
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7750/4/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

Line 324: mkdir -p $TEST_LOGDIR # In case no tests are run (the directory won't 
be created)
> Don't need this anymore?
Good catch!


-- 
To view, visit http://gerrit.cloudera.org:8080/7750
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [build-support] IWYU build configuration for Jenkins

2017-08-21 Thread Alexey Serbin (Code Review)
Hello Dan Burkert, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7750

to look at the new patch set (#5).

Change subject: [build-support] IWYU build configuration for Jenkins
..

[build-support] IWYU build configuration for Jenkins

Added provisions to run the include-what-you-use tool against the
updated files in the changelist.

Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e
---
A build-support/iwyu/iwyu_tool.py
M build-support/jenkins/build-and-test.sh
2 files changed, 326 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/7750/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7750
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] rpc: periodic timers

2017-08-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: rpc: periodic timers
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic-test.cc
File src/kudu/rpc/periodic-test.cc:

Line 77:   ASSERT_GT(counter_, 0);
is this going to be flaky? maybe an AssertEventually is more robust?


Line 88: TEST_P(PeriodicTimerTest, TestReset) {
have you looped this with tsan? I'm worried it will be flaky if this main 
thread gets descheduled for more than 20ms. maybe need a much larger period to 
avoid flakiness?


http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.cc
File src/kudu/rpc/periodic.cc:

PS2, Line 54: messenger, functor
I'm surprised clang-tidy dind't complain about these, I'd have expected passing 
them through with std::move


PS2, Line 133:  // We must schedule the next callback with the minimal period 
as the delay
 :   // so that if Reset() is called with a low value, the 
scheduled callback
 :   // can still honor it.
 :   
I don't quite follow this. If we called Reset() with a small delay (less than 
minimum period) wouldn't we just schedule a new callback precisely for the 
requested time?


http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.h
File src/kudu/rpc/periodic.h:

PS2, Line 40: Messenger::ScheduleOnReactor
I think it's worth a note here that, since the tasks are run on reactor 
threads, they are not allowed to block or otherwise do lengthy computation, etc.


PS2, Line 59: periodic_period_jitter_pct
hrm, I don't know about making this a general flag vs passing it in to the 
PeriodicTimer instance. I think we've been trying to avoid gflags being read 
too much from utility code that may affect multiple subsystems.

That said, if it's an experimental/unstable flag it's no big deal since we can 
always adjust later.


PS2, Line 83: period for the next
:   // task. 
not 100% clear here. Is this setting the period (overriding what's in the 
'period' set in the ctor) or is this more of an 'initial delay' after which it 
will revert to rescheduling based on the configured period/jitter? Also, will 
jitter be applied to the provided value or will this be an exact delay?

Maybe something like "delay for scheduling the next task" or "delay after the 
task will first be invoked, after which it will revert to the scheduling 
paramters provided in the constructor" or something


Line 92:   // If 'next_task_delta' is provided, it is used as the new period. 
Otherwise,
Similar to the above, not sure whether this means to reset the period for all 
subsequent schedulings, or just the subsequent call.

If the latter, maybe 'Snooze' is a better name than 'Reset' since reset sounds 
more like a permanent change?


Line 93:   // the period is generated in accordance with the timer's period and 
jitter
in the case that it is boost::none, this still will "snooze" one full period 
worth, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/7733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] rpc: periodic timers

2017-08-21 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: rpc: periodic timers
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7733/2/src/kudu/rpc/periodic.cc
File src/kudu/rpc/periodic.cc:

Line 38: DEFINE_double(periodic_period_jitter_pct, 0.25,
I don't think exposing this is a good idea.  It's not clear what it's actually 
controlling.  If we wanted to make this configurable, I'd expect it to be done 
on a per-use basis (e.g. heartbeat_period_jitter).


-- 
To view, visit http://gerrit.cloudera.org:8080/7733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1

2017-08-21 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has uploaded a new patch set (#6).

Change subject: Kudu Consistency Blog Post Pt1
..

Kudu Consistency Blog Post Pt1

This is the first part of multi-part blog post series about
consistency in Kudu.
It's hard to talk about consistency topics without context,
particularly in blog posts where we can't assume very much about
the readers. After struggling a bit with coming up with something
that could be self contained in a single post I ended up biting
the bullet and putting my name down to write a multi-part series.

The series will cover Kudu from a consistency standpoint. The
first post (this one) is about motivation and how the write
and read paths work in general. Follow up posts will cover
specific components, edge cases and guarantees. I'll finish
with a post about overall semantics and what users can expect
in the future.

Rendering available at:
http://dralves-server:4000/2017/08/21/kudu-consistency-pt1.html

Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
---
A _posts/2017-08-21-kudu-consistency-pt1.md
1 file changed, 152 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7019/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Will Berkeley 


  1   2   >