The client does not try to upload the same row again and again. The hbase client tries a few times internally, but then if the exception gets out to the client application, it is logged and the application moves on. The client application's log (store.log) actually shows some successes in among the failures.
My reading of the log file is not the same as yours. It looks to me as if each row is tried 5 times, throwing WREs each time, before moving on to another row. All the errors do seem to be regarding the same region though ( pagefetch,http://fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026,1202660655358, startKey='http://fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026', getEndKey()='http://fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026). I tried stopping the client application, and restarting it at the point where it failed, with no success. I tried restarting the region server and master server too, also without success. - Marc P.S. Should this discussion be happening in JIRA or here or both? On Mon, 2008-02-11 at 11:27 -0800, stack wrote: > Marc Harris wrote: > > Logs sent via yousendit.com. > > > > > Thanks for the logs. I took a quick look. Upload seems to be going a > long fine until we start getting the WrongRegionException. In issue > HBASE-428, you say your client is single-threaded. Is it think-headed > too (smile) in that it unrelentingly keeps trying the same row over and > over? (The log seems to have prob. w/ same row over and over again). > > Guessing as to what is up, either the client cache of regions is messed > up or the .META. table has become corrupt somehow -- it doesn't have > list of all regions (Perhaps it didn't get a split update or some such). > > If the former, I wonder what would happen if you took your load off, > killed the client, then resumed at the problematic row? If things > started to work again, would seem to point at client-side issue. > > > Maybe "re-architect" was not an accurate representation of what I am > > doing. We currently do not have a solution that allows us to add rows to > > our system in arbitrary order and then analyze them, either in order or > > using map-reduce. A year or so ago we tried an RDBMS, and based on that > > experience, and some comments from Doug Cutting,decided that an RDBMS > > had no change of being able to support this kind of functionality. > > > > In terms of performance parameters, the 200 rows/sec that was achieved > > for the first 500K rows was quite sufficient. I don't have a good answer > > because after all these rows get loaded there will be numerous > > map/reduce jobs that execute on them. I would guess that some vague > > parameters are: > > > > - In 3 days, load 100Gb of data representing 10M "units" split over 3 > > tables each of which is split over 3 column families. Some fraction of > > these "units" will be replacements for existing ones (same key) some > > will be new > > - Several map-reduce jobs that mostly involve reading the data for each > > "unit" and then writing a few small pieces of data (a few bytes) for > > each "unit". Probably some more interesting maps too, but I don't know > > yet. > > - At least 2 map-reduce jobs that delete units. > > > > These numbers look reasonable to me. Lets try and make it work. > > Am I correct when I say that using 4 region servers will just delay the > > problem by a factor of 4, or have I misunderstood the underlying cause? > > > > > Yes. > > The factor might be > 4 but effectively, if an issue using single > server, then same issue will arise with N nodes. > > St.Ack
