Replies to yours inline.

Jim Kellerman (JIRA) wrote:
[ https://issues.apache.org/jira/browse/HBASE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642525#action_12642525 ]
Jim Kellerman commented on HBASE-728:
-------------------------------------

From: Michael Stack [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 22, 2008 8:53 PM
To: [email protected]
Subject: Re: svn commit: r707247 - in /hadoop/hbase/trunk: ./ conf/ src/java/org/apache/hadoop/hbase/regionserver/

How does new feature effect hbase throughput?  Does it make it slower?
Faster?  Any measurement done?

I measured PerformanceEvaluation random write 1 with one region server
before and after the appends patch.

I would say that throughput is either the same or a little faster.

I only ran one run on the code before appends, and this test completed
in 2 minutes 31 seconds

In fixing up a couple of bugs in appends, I have run this test 5 times.
The slowest was 2 minutes 33 seconds, but the other times were all faster:
2:24, 2:20, 2:21 and 2:21.

Would suggest running with much higher rates to see if it breaks; suggest many clients writing into the one regionserver.

I was thinking that the size of the log file is a better measure of when to rotate given that there can be a wide divergence in WAL log file size but maybe not given that flush sequenceids are pegged against a particular edit.

This could be done either way and I have no preference. With the default
settings, running PerformanceEvaluation random write 1 with one region
server, the HLogs were about 160MB. It might be nice to use the file size
so we can get closer to a multiple of HDFS block size. Doing so, might
be better in the general case, which is any application except
PerformanceEvaluation. In some cases, we might put more updates into a
log (if keys and values are small), and in others we might put fewer
(when keys and values are large). Being close to a multiple of HDFS block
size is probably a good thing, so I am kind of leaning toward log size
instead of number of updates. What do others think?

I think its better to have the roll based off edit counts rather than size, at least at first. While there may be some mild performance benefit to our coming close to blocksize, we'll never hit it spot on and logs are let go based on whether they contain edits that are older than a sequenceid -- i.e. a particular edit, not an edits size.

We have convention naming threads. Its name of server -- master/regionserver host and port -- followed by the what thread does (This used to be hlog? Or log?). Makes it easy sorting them out in thread dump.

Currently the thread is named HLog. Would it be preferable to name it
<servername>.Hlog ? Log entries only appear in one region server's log.
Does it matter?
Minor, if multiple regionservers in the one JVM, as in unit tests, it'll help. But I'm more about this new thread name aligning with how all other threads in hbase are named.

Should this Log thread inherit from Chore?

Currently only the root, meta scanners and CleanOldTransactions (in
regionserver.transactional) extend chore. This change was made a while
back, but I can't remember why. Should all the threads in HRS and HMaster
extend Chore? We would need to add the "interrupt politely" method,
but I can't think of a reason we shouldn't do this (as a separate Jira).

Agreed. Separate, low-priority JIRA.
There is a place in HRS where all service threads are started.   Now
HLog is a Thread, should it be moved in there? Into startServiceThreads?

Currently, the HLog thread is started by HRS.setupHLog. Since it is called
from multiple locations, moving the thread start to startServiceThreads,
would involve extra synchronization.
It looks like its called from two places, on init and when MSG_CALL_SERVER_STARTUP. Why if we get MSG_CALL_SERVER_STARTUP, will we not have two HLog Threads running?
However I note that the HLog thread is not set to be a daemon thread, which
should probably be fixed.


Yes.

St.Ack

Reply via email to