David Ribeiro Alves has posted comments on this change.

Change subject: async background flush provision for C++ client
......................................................................


Patch Set 2:

(62 comments)

overall I think this needs more tests. Since the flusher is a thread, likely 
the unit tests will also have to use threads to:
- make sure limits are not exceeded
- make sure that operations are in fact buffered up to a point
- Test the multiple triggers of a flush.

Beyond the the unit tests, I think this also needs an integration test that 
stresses it all.

http://gerrit.cloudera.org:8080/#/c/3668/2//COMMIT_MSG
Commit Message:

Line 7: async background flush provision for C++ client
nit: capitalize (Async)


PS2, Line 7: provision
nit: remove provision


PS2, Line 10: is
nit: avoid using the passive voice. i.e. "KuduSession starts a background 
flusher thread" not a thread is started


PS2, Line 13: virtual
what's a virtual buffer?


Line 16: The flush criteria is based on buffer size for not-yet-scheduled
I'm having trouble parsing this sentence.


PS2, Line 17: operatations
nit: type


PS2, Line 20: operatations
nit: typo


PS2, Line 23: KUDU
Mention this issue in the commit message title.


http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/batcher.cc
File src/kudu/client/batcher.cc:

Line 267: #ifndef NDEBUG
spurious change


Line 383: 
spurious change


Line 553:   //        since the other time it's called in KuduSession::Apply()
nit: rephrase. How about:
Optimize and call write_op->SizeInBuffer() only once. We're currently calling 
it twice, here and in KuduSession::Apply()


http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/batcher.h
File src/kudu/client/batcher.h:

Line 109:   // Return number of bytes used for batcher buffer
nit: "Returns the"
nit: period at the end os sentences/


Line 114:   // Compute in-buffer size for the given write operation
nit: Computes
nit: period


PS2, Line 216: //
no need for this comment


http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

Line 1991: // return an error with session running in any supported flush mode
nit: period


PS2, Line 1993: buffer_space_bytes_limit
style: kBufferSpaceBytesLimit or BUFFER_SPACE_BYTES_LIMIT


PS2, Line 1993: static
not need for static.


PS2, Line 1994: static
same


Line 1995:   static const KuduSession::FlushMode modes[] = {
same


Line 2001:   for (const auto mode: modes) {
isn't 'auto' already const ?


Line 2007:     ASSERT_TRUE(s.IsIncomplete()) << "got unexpected status: " << 
s.ToString();
nit: capitalize Got


Line 2014:   // Applying a bunch of small rows without a flush should not 
result in
this sentence is huge and hard to read. please break it down.


Line 2016:   // since there is a flow control which makes Session::Apply() 
block and wait
what: "a flow control"?


Line 2018:   {
no need for the extra { scope


Line 2023:       ASSERT_OK(ApplyInsertToSession(session.get(), client_table_, 
1, 1, "x"));
How this testing that Apply() is eventually blocking, or making sure that the 
buffer value is not surpassed for that matter?


Line 2029: // operations are put into the queue flushed after some time.
you mean : "into the queue and flushed" right?


Line 2030: // The exact timeout interval is implementation-dependent,
which timeout interval?


Line 2031: // but 100 ms is a good upper limit anyways.
s/anyways/anyway


Line 2033:   {
no need for the extra {


Line 2041:     SleepFor(MonoDelta::FromMilliseconds(100));
loop instead of wait, otherwise this will be brittle in jenkins


http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/client.cc
File src/kudu/client/client.cc:

Line 705:           "Cannot change flush mode when writes are buffered");
nit period


Line 707:   data_->SetFlushMode(m);
would it simplify things to now allow the flush mode to change if there is a 
flusher thread? if not would it make sense to have this stop the thread (if the 
new mode is different) and wait for it to finish before allowing to change the 
mode? seems like there is some complexity attached to the fact that we allow 
this to change at anytime. simplifying the possible states there would be good


Line 728:   return data_->SetBufferBytesLimit(size);
why not use the same method name?


http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 678:     // The Flush() call can be used to block until the latest batch 
is sent.
maybe add/replace: "a batch is sent and the buffer has available space again"


http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/session-internal.cc
File src/kudu/client/session-internal.cc:

Line 48:   // intentionally left empty
no ned for this, you can also join the brackets in the line above, i.e. {}


Line 53:     WARN_NOT_OK(Stop(), "unable to stop AUTO_FLUSH_BACKGROUND thread");
capitalize. Also a WARN? what happens if this fails?


Line 58:   CHECK(!thread_) << "has already started";
complete the sentence


Line 69:   while (true) {
Overall I think this logic can be simplified

You can have a simple_spinlock that protects the relevant fields (like the 
callback or is_shutdown_). When the callback is set, it means that an explicit 
flush was required (the point is to get rid of a couple of bools there, but 
would understand keeping one).

Finally you can replace the condition variables with a CountDownLatch (which 
cvs internally).

each time you get a condition that makes the loop progress or exit use 
latch_->CountDown()  and you can wait on the latch where relevant.


PS2, Line 75: ||
nit: style, move this to the line below, same elsewhere


Line 85:         // high watermark is 1/2 of the buffer space limit for 
non-flushed data
can you use a flag for the over_watermark that you set as a percentage of the 
buffer limit?


Line 98:         do_explicit_flush = true;
these two variables are confusing.


Line 114:       sp::shared_ptr<KuduSession> session(weak_session_.lock());
why not just increase the ref count?


Line 138:   is_explicit_flush_ = true;
could you get rid of this bool and just call the cb if it is set?


Line 149:       buffer_bytes_limit_(7 * 1024 * 1024),  // TODO: clarify on this
use a FLAG for this


Line 160: void KuduSession::Data::Init(const shared_ptr<KuduSession>& session) {
be consistent. sometimes you use shared_ptr, other time sp:shared_ptr and other 
times client::sp::shared_ptr, you should use the least amount possible of 
qualifiers


Line 167:   WARN_NOT_OK(flusher_->Start(),
shouldn't this return an error to the caller


Line 198:   // Both flush and flow control logic need to know on buffer limit 
change
nit: remove "on"
more importantly overall I think you should only allow to set a buffer limit 
before you start writing.


Line 206: Status KuduSession::Data::CheckAgainstBufferLimit(size_t 
required_size) const {
nit: remove "Against"


Line 222:     WARN_NOT_OK(flusher_->ForceFlush(cb), "failed to wake background 
flusher");
isn't this serious enough that the caller should get an error?


Line 335:     // because the background flusher could trigger spurios flush 
otherwise
typo


http://gerrit.cloudera.org:8080/#/c/3668/2/src/kudu/client/session-internal.h
File src/kudu/client/session-internal.h:

PS2, Line 53: is set on currently
I don;t think you need to mention what the caller will pass as a flush mode.


Line 58:   // Set limit on buffer space consumed by buffered write operations
nit period here and elsewhere


Line 65:   // calling the callback with the flush result
nit period


Line 105:   void LessBufferedBytes(int64_t bytes_used);
less seems to imply comparison. Can you rename to DecreaseBufferedBytes


Line 107:   // Check if the specified size fits buffer given its current size 
limit
period at the end of this sentence and all the other ones.


Line 110:   // Mutex for the condition variables below
this seems to be used for more than the ConditionVariables.


Line 129:   class BackgroundFlusher {
move the inner class above the other private methods and fields.


Line 137:     // initialize the object and start the thread
capitalize here and elsewhere


Line 151:     client::sp::weak_ptr<KuduSession> weak_session_;
why isn't this hanging on to the KuduSession. What happens if it gets deleted 
while this thread is working (remember the user controls the lifetime if you 
don't increase the ref count)


PS2, Line 155: consequitive
nit typo


Line 158:     // whether the thread is shutdown (guarded by cond_mutex_)
nit: capitalize here and elsewhere


Line 169:   }; // class KuduSession::Data::Flusher
we don't usually use class closing comments


-- 
To view, visit http://gerrit.cloudera.org:8080/3668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idfde05bef342db24990c6e3da3b0270c3bb37a9d
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to