Re: HBASE-138: Under load, regions become extremely large and eventually cause region servers to become unresponsive

stack Fri, 08 Feb 2008 10:08:28 -0800

Marc Harris wrote:

I have create a JIRA issue for this, HBASE-428


Yes, things are improved a bit (it takes longer to get to the problem
state by a factor of about 10 rows), but not much. I have put some of
the exceptions in the bug. On Sunday I should be able to run the load
again with debug logging on (if I find out how to). Probably not worth
sending you my regionserver log until then.


http://wiki.apache.org/hadoop/Hbase/FAQ#4

Yeah, DEBUG will help. It has stuff like how long flushes andcompactions are taking and the count of Store files that are beingcompacted at any one time. Will help figure whats going on.

At the moment the functionality that I am trying to re-architect runs
happily on 1 server, so it would be a hard sell to say that we need 4
servers 4 it. Anyway, as I understand the bug, wouldn't that just reduce
the probability of a problematic region by a factor of 4? So the problem
will just take 4 times as long to appear which is not much help. It's
not like the node is a cluster can actually compensate for each other.
But I don't really understand fully what the issue is.

Are you using an RDBMS now in your current soln? How close to yourcurrent soln. does HBase have to come Marc? (And what are you lookingfor? 1M/10M/100M into a single server in N hours?).


Thanks for persevering with the testing.

St.Ack

- Marc


On Thu, 2008-02-07 at 20:38 -0800, stack wrote:

Marc Harris wrote:
I have installed 0.16.0 rc 1 which I believe contains a fix for this
issue, but I still see the same problem.

- I am using a single node.
- The client application runs in a single thread, loading data into a
single table.
- I get good throughput of about 200 rows/sec to start with, with
occasional significant drops due to NotServingRegionException's that are
recoverable on client retry (internal to hbase).
- After 54 minutes, and about 500,000 rows I start to see
WrongRegionException's in the client application, i.e. real failures.
Are things improved at all? Were you able to do 500k rows with previoushbase versions?
Send us over some of those WREs.  We'd thought we'd fixed those.
- Throughput rapidly drops to only a few rows per minute plus a few rows
that had errors

Should I be adding these comments to the JIRA issue? I did not see a way
to reopen the issue; perhaps I just don't have the permission necessary.
Yeah, make a JIRA. Describe roughly the data type, sizes, and schema.Want to send me your regionserver log? Do you have DEBUG enabled?That'd help. (I have still to look at the log you sent me previous --I'll get to it). Is it critical that this work all on one server only?For example, would it be an option to run 4 servers?
Thanks Marc,
St.Ack
Thanks,
- Marc

Re: HBASE-138: Under load, regions become extremely large and eventually cause region servers to become unresponsive

Reply via email to