Marc Harris wrote:
Logs sent via yousendit.com.

Thanks for the logs. I took a quick look. Upload seems to be going a long fine until we start getting the WrongRegionException. In issue HBASE-428, you say your client is single-threaded. Is it think-headed too (smile) in that it unrelentingly keeps trying the same row over and over? (The log seems to have prob. w/ same row over and over again).

Guessing as to what is up, either the client cache of regions is messed up or the .META. table has become corrupt somehow -- it doesn't have list of all regions (Perhaps it didn't get a split update or some such).

If the former, I wonder what would happen if you took your load off, killed the client, then resumed at the problematic row? If things started to work again, would seem to point at client-side issue.

Maybe "re-architect" was not an accurate representation of what I am
doing. We currently do not have a solution that allows us to add rows to
our system in arbitrary order and then analyze them, either in order or
using map-reduce. A year or so ago we tried an RDBMS, and based on that
experience, and some comments from Doug Cutting,decided that an RDBMS
had no change of being able to support this kind of functionality.

In terms of performance parameters, the 200 rows/sec that was achieved
for the first 500K rows was quite sufficient. I don't have a good answer
because after all these rows get loaded there will be numerous
map/reduce jobs that execute on them. I would guess that some vague
parameters are:

- In 3 days, load 100Gb of data representing 10M "units" split over 3
tables each of which is split over 3 column families. Some fraction of
these "units" will be replacements for existing ones (same key) some
will be new
- Several map-reduce jobs that mostly involve reading the data for each
"unit" and then writing a few small pieces of data (a few bytes) for
each "unit". Probably some more interesting maps too, but I don't know
yet.
- At least 2 map-reduce jobs that delete units.

These numbers look reasonable to me.  Lets try and make it work.
Am I correct when I say that using 4 region servers will just delay the
problem by a factor of 4, or have I misunderstood the underlying cause?

Yes.

The factor might be > 4 but effectively, if an issue using single server, then same issue will arise with N nodes.

St.Ack

Reply via email to