[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

stack (JIRA) Wed, 04 Dec 2013 17:38:24 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839662#comment-13839662
 ]


stack commented on HBASE-8755:
------------------------------

Here is more review on the patch.  Make the changes suggested below and I'll +1 
it.

(Discussion off-line w/ Feng on this issue helped me better understand this 
patch and put to rest any notion that there is an easier 'fix' than the one 
proposed here.  That said.  There is much room for improvement but this can be 
done in a follow-on)

Remove these asserts rather than comment them out given they depended on a 
facility this patch removes.  Leaving them in will only make the next reader of 
the code -- very likely lacking the context you have -- feel uneasy thinking 
someone removed asserts just to get tests to pass.

 8 -    assertTrue("Should have an outstanding WAL edit", ((FSHLog) 
log).hasDeferredEntries());
 9 +    //assertTrue("Should have an outstanding WAL edit", ((FSHLog) 
log).hasDeferredEntries());

On the below...

+import java.util.Random;

... using a Random for choosing an arbitrary thread for a list of 4 is 
heavyweight.  Can you not take last digit of timestamp or nano timestamp or 
some attribute of the edit instead?  Something more lightweight?

Please remove all mentions of AsyncFlush since it no longer exists:

// all writes pending on AsyncWrite/AsyncFlush thread with

Leaving it in will confuse readers when they can't find any such thread class.

Is this comment right?

// txid <= failedTxid will fail by throwing asyncIOE

Should it be >= failedTxid?

This should be volatile since it is set by AsyncSync and then used by the main 
FSHLog thread (you have an assert to check it not null -- maybe you ran into an 
issue here already?):

 +  private IOException asyncIOE = null;

bq. +  private final Object bufferLock = new Object();

'bufferLock' if a very generic name. Could it be more descriptive?  It is a 
lock held for a short while while AsyncWriter moves queued edits off the 
globally seen queue to a local queue just before we send the edits to the WAL.  
You add a method named getPendingWrites that requires this lock be held.  Could 
we tie the method and the lock together better? Name it pendingWritesLock?  
(The name of the list to hold the pending writes is pendingWrites).

bq. ...because the HDFS write-method is pretty heavyweight as far as locking is 
concerned.

I think the heavyweight referred to in the above is hbase locking, not hdfs 
locking as the comment would imply.  If you agree (you know this code better 
than I), please adjust the comment.

Comments on what these threads do will help the next code reader.  AsyncWriter 
does adding of edits to HDFS.  AsyncSyncer needs a comment because it is 
oxymoronic (though it makes sense in this context).  In particular, a comment 
would draw out why we need so many instances of a syncer thread because 
everyone's first thought here is going to be why do we need this?  Ditto on the 
AsyncNotifier.  In the reviews above, folks have asked why we need this thread 
at all and a code reader will likely think similar on a first pass.  
Bottom-line, your patch raised questions from reviewers; it would be cool if 
the questions were answered in code comments where possible so the questions do 
not come up again.

  4 +  private final AsyncWriter   asyncWriter;
  5 +  private final AsyncSyncer[] asyncSyncers = new AsyncSyncer[5];
  6 +  private final AsyncNotifier asyncNotifier;

You remove the LogSyncer facility in this patch.  That is good (need to note 
this in release notes).  Your patch should remove the optional flush config 
from hbase-default.xml too since it no longer is relevant.

  3 -    this.optionalFlushInterval =
  4 -      conf.getLong("hbase.regionserver.optionallogflushinterval", 1 * 
1000);

I see it here...

hbase-common/src/main/resources/hbase-default.xml:    
<name>hbase.regionserver.optionallogflushinterval</name>

A small nit is you might look at other threads in hbase and see how they are 
named... 

3 +    asyncWriter = new AsyncWriter("AsyncHLogWriter");

Ditto here:

 +      asyncSyncers[i] = new AsyncSyncer("AsyncHLogSyncer" + i);

Probably make the number of asyncsyncers a configuration (you don't have to put 
the option out in hbase-default.xml.. just make it so that if someone is 
reading the code and trips over this issue, they can change it by adding to 
hbase-site.xml w/o having to change code -- lets not reproduce the hard-coded 
'80' that is in the head of dfsclient we discussed yesterday -- smile).

... and here: asyncNotifier = new AsyncNotifier("AsyncHLogNotifier");

Not important but check out how other threads are named in hbase.  It might be 
good if these better align.

Maybe make a method for shutting down all these thread or use the 
Threads#shutdown method in Threads.java?

bq. LOG.error("Exception while waiting for AsyncNotifier threads to die", e);

Do LOG.error("Exception while waiting for" + t.getName() + " threads to die", 
e); instead?

Call this method per AsyncSyncer rather than do as you have here:

  6 +    try {
  7 +      for (int i = 0; i < asyncSyncers.length; ++i) {
  8 +        asyncSyncers[i].interrupt();
  9 +        asyncSyncers[i].join();
 10        }
 11 +    } catch (InterruptedException e) {
 12 +      LOG.error("Exception while waiting for AsyncSyncer threads to die", 
e);
 13      }

where if we fail to join on the first asyncsync, we will not shut down the 
rest.  This is a nit.  Not important since we are closing down anyways and if 
an exception here, it is probably fatal.

These threads are NOT daemon threads though so a fail to shut them down will 
hold up the JVM's going down.  Maybe they should all be daemon threads just in 
case?

I see this in close:

  4 +          synchronized (bufferLock) {
  5 +            doWrite(info, logKey, edits, htd);
  6 +            txid = this.unflushedEntries.incrementAndGet();
  7 +          }

... but we do not seem to be doing it on the other call to doWrite at around 
line #969 inside in append.  Should we be holding the lock at this location 
(maybe we do and I am just not seeing it because I am just looking at FSHLog 
and the patch side-by-side)?

What is going on here?

{code}
  1 +    // wake up (called by (write) handler thread) AsyncWriter thread
  0 +    // to write buffered writes to HDFS
  1 +    public void setPendingTxid(long txid) {
{code}

This method is only called at close time if I read the patch right.  It is not 
called by '...(write) handler thread'?  I am having trouble understanding how 
this signaling works.

Is this 'fatal'?  Or is it an 'error' since we are going into recovery... 
trying to role the log?

  2 +            LOG.fatal("Error while AsyncWriter write, request close of 
hlog ", e);

Here we failed a write:

{code}
 11 +          } catch(IOException e) {
 10 +            LOG.fatal("Error while AsyncWriter write, request close of 
hlog ", e);
  9              requestLogRoll();
  8 -            Threads.sleep(this.optionalFlushInterval);
  7 +
  6 +            asyncIOE = e;
  5 +            failedTxid.set(this.txidToWrite);
  4            }
{code}

.. and request a log roll yet we carry on to try and sync, an op that will 
likely fail?   We are ok here?  We updated the write txid but not the sync txid 
so that should be fine.

Do we need this: if (!asyncSyncers[i].isSyncing())  DFSClient will allow us 
call sync concurrently?

Yeah... could use the last digit of the txid rather than random?  That should 
be enough?

 3 +            int idx = rd.nextInt(asyncSyncers.length);
 2 +            asyncSyncers[idx].setWrittenTxid(this.lastWrittenTxid);

In the above, what if the writtentxid is > than what we are to sync?  Will that 
cause a problem?  What will the sync do in this case?  (nvm... I asked this 
question already.  It is answered later in the code.  The sync threads are 
running all the time.  I thought they fired once and completed.  My 
misunderstanding.   Ignore the above).

Can these be static classes or do they need context form the hosting FSHLog?

private class AsyncSyncer extends HasThread {

This class definetly needs more comment either here or above when it is set up 
as a data member:

  3 +  // thread to request HDFS to sync the WALEdits written by AsyncWriter
  2 +  // to make those WALEdits durable on HDFS side

These method names should not talk about 'flush'.  They should be named 'sync' 
instead:

  3 +    private long txidToFlush = 0;
  2 +    private long lastFlushedTxid = 0;

Same for the flushlock.

Why atomic boolean and not just a volatile here?

private AtomicBoolean isSyncing = new AtomicBoolean(false);

You are not using any feature of Atomic other than get and set so volatile 
should work.

bq.  +        LOG.info(getName() + " exiting");

The above is very important.  All your threads do this?  Nothing worse than a 
silently exiting thread (smile).

Here is that comment again:

{code}
  1 +  // appends new writes to the pendingWrites. It is better to keep it in
  0 +  // our own queue rather than writing it to the HDFS output stream because
  1 +  // HDFSOutputStream.writeChunk is not lightweight at all.
  2 +  // it's caller's responsibility to hold updateLock before call this 
method
{code}

It talks about writeChunk being expensive but we are not doing anything to 
ameliorate dfsclient writes if I dig down into our log writer... we just write 
as we go.  So, the comment is either correct and we are ignoring it in 
implementation or the comment is wrong and we should be removed.  This is out 
of scope of your patch.  You are just moving this comment I'd say.  If so, 
update the comment to add a TODO saying needs investigation.

Patch is great. Lets get it in.  The above remarks are mostly minor just 
looking for clarification.  Thanks [~fenghh]














> A new write thread model for HLog to improve the overall HBase write 
> throughput
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-8755
>                 URL: https://issues.apache.org/jira/browse/HBASE-8755
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, wal
>            Reporter: Feng Honghua
>            Assignee: stack
>            Priority: Critical
>         Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
> HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
> HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
> HBASE-8755-trunk-v4.patch
>
>
> In current write model, each write handler thread (executing put()) will 
> individually go through a full 'append (hlog local buffer) => HLog writer 
> append (write to hdfs) => HLog writer sync (sync hdfs)' cycle for each write, 
> which incurs heavy race condition on updateLock and flushLock.
> The only optimization where checking if current syncTillHere > txid in 
> expectation for other thread help write/sync its own txid to hdfs and 
> omitting the write/sync actually help much less than expectation.
> Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
> proposed a new write thread model for writing hdfs sequence file and the 
> prototype implementation shows a 4X improvement for throughput (from 17000 to 
> 70000+). 
> I apply this new write thread model in HLog and the performance test in our 
> test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
> RS, from 22000 to 70000 for 5 RS), the 1 RS write throughput (1K row-size) 
> even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
> write throughput then is 31002). I can provide the detailed performance test 
> results if anyone is interested.
> The change for new write thread model is as below:
>  1> All put handler threads append the edits to HLog's local pending buffer; 
> (it notifies AsyncWriter thread that there is new edits in local buffer)
>  2> All put handler threads wait in HLog.syncer() function for underlying 
> threads to finish the sync that contains its txid;
>  3> An single AsyncWriter thread is responsible for retrieve all the buffered 
> edits in HLog's local pending buffer and write to the hdfs 
> (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
> writes to hdfs that needs a sync)
>  4> An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
> to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
> that sync watermark increases)
>  5> An single AsyncNotifier thread is responsible for notifying all pending 
> put handler threads which are waiting in the HLog.syncer() function
>  6> No LogSyncer thread any more (since there is always 
> AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

Reply via email to