Re: HBASE-138: Under load, regions become extremely large and eventually cause region servers to become unresponsive

stack Mon, 11 Feb 2008 11:30:14 -0800

Marc Harris wrote:

Logs sent via yousendit.com.

Thanks for the logs. I took a quick look. Upload seems to be going along fine until we start getting the WrongRegionException. In issueHBASE-428, you say your client is single-threaded. Is it think-headedtoo (smile) in that it unrelentingly keeps trying the same row over andover? (The log seems to have prob. w/ same row over and over again).

Guessing as to what is up, either the client cache of regions is messedup or the .META. table has become corrupt somehow -- it doesn't havelist of all regions (Perhaps it didn't get a split update or some such).

If the former, I wonder what would happen if you took your load off,killed the client, then resumed at the problematic row? If thingsstarted to work again, would seem to point at client-side issue.

Maybe "re-architect" was not an accurate representation of what I am
doing. We currently do not have a solution that allows us to add rows to
our system in arbitrary order and then analyze them, either in order or
using map-reduce. A year or so ago we tried an RDBMS, and based on that
experience, and some comments from Doug Cutting,decided that an RDBMS
had no change of being able to support this kind of functionality.

In terms of performance parameters, the 200 rows/sec that was achieved
for the first 500K rows was quite sufficient. I don't have a good answer
because after all these rows get loaded there will be numerous
map/reduce jobs that execute on them. I would guess that some vague
parameters are:

- In 3 days, load 100Gb of data representing 10M "units" split over 3
tables each of which is split over 3 column families. Some fraction of
these "units" will be replacements for existing ones (same key) some
will be new
- Several map-reduce jobs that mostly involve reading the data for each
"unit" and then writing a few small pieces of data (a few bytes) for
each "unit". Probably some more interesting maps too, but I don't know
yet.
- At least 2 map-reduce jobs that delete units.


These numbers look reasonable to me.  Lets try and make it work.

Am I correct when I say that using 4 region servers will just delay the
problem by a factor of 4, or have I misunderstood the underlying cause?

Yes.

The factor might be > 4 but effectively, if an issue using singleserver, then same issue will arise with N nodes.


St.Ack

Re: HBASE-138: Under load, regions become extremely large and eventually cause region servers to become unresponsive

Reply via email to