Marc Harris wrote:
I have create a JIRA issue for this, HBASE-428
Yes, things are improved a bit (it takes longer to get to the problem
state by a factor of about 10 rows), but not much. I have put some of
the exceptions in the bug. On Sunday I should be able to run the load
again with debug logging on (if I find out how to). Probably not worth
sending you my regionserver log until then.
http://wiki.apache.org/hadoop/Hbase/FAQ#4
Yeah, DEBUG will help. It has stuff like how long flushes and
compactions are taking and the count of Store files that are being
compacted at any one time. Will help figure whats going on.
At the moment the functionality that I am trying to re-architect runs
happily on 1 server, so it would be a hard sell to say that we need 4
servers 4 it. Anyway, as I understand the bug, wouldn't that just reduce
the probability of a problematic region by a factor of 4? So the problem
will just take 4 times as long to appear which is not much help. It's
not like the node is a cluster can actually compensate for each other.
But I don't really understand fully what the issue is.
Are you using an RDBMS now in your current soln? How close to your
current soln. does HBase have to come Marc? (And what are you looking
for? 1M/10M/100M into a single server in N hours?).
Thanks for persevering with the testing.
St.Ack
- Marc
On Thu, 2008-02-07 at 20:38 -0800, stack wrote:
Marc Harris wrote:
I have installed 0.16.0 rc 1 which I believe contains a fix for this
issue, but I still see the same problem.
- I am using a single node.
- The client application runs in a single thread, loading data into a
single table.
- I get good throughput of about 200 rows/sec to start with, with
occasional significant drops due to NotServingRegionException's that are
recoverable on client retry (internal to hbase).
- After 54 minutes, and about 500,000 rows I start to see
WrongRegionException's in the client application, i.e. real failures.
Are things improved at all? Were you able to do 500k rows with previous
hbase versions?
Send us over some of those WREs. We'd thought we'd fixed those.
- Throughput rapidly drops to only a few rows per minute plus a few rows
that had errors
Should I be adding these comments to the JIRA issue? I did not see a way
to reopen the issue; perhaps I just don't have the permission necessary.
Yeah, make a JIRA. Describe roughly the data type, sizes, and schema.
Want to send me your regionserver log? Do you have DEBUG enabled?
That'd help. (I have still to look at the log you sent me previous --
I'll get to it). Is it critical that this work all on one server only?
For example, would it be an option to run 4 servers?
Thanks Marc,
St.Ack
Thanks,
- Marc