I will rerun tests on our perf cluster (20 nodes 32CPU each , 96GB). It has 0.94.CDH4.3 and will let you know results. The preliminary run revealed similar issues - I need to make sure that our config and cluster set up is correct before I kick off the real 500M rows test.
On Thu, Jan 16, 2014 at 12:58 PM, lars hofhansl <[email protected]> wrote: > In any case, though, I would not expect HBase to have any issue with that, > unless there are some server issues at the HDFS layer. > > @Vladimir, what happens when you run HDFS' DFSIO? > > -- Lars > > > > ----- Original Message ----- > From: Bryan Beaudreault <[email protected]> > To: [email protected] > Cc: > Sent: Thursday, January 16, 2014 10:33 AM > Subject: Re: HBase 0.94.15: writes stalls periodically even under moderate > steady load (AWS EC2) > > This might be better on the user list? Anyway.. > > How many IPC handlers are you giving? m1.xlarge is very low cpu. Not only > does it have only 4 cores (more cores allow more concurrent threads with > less context switching), but those cores are severely underpowered. I > would recommend at least c1.xlarge, which is only a bit more expensive. If > you happen to be doing heavy GC, with 1-2 compactions running, and with > many writes incoming, you are quickly using up quite a bit of CPU. What is > the load and CPU usage, on the 10.38.106.234:50010? > > Did you see anything about blocking updates in the hbase logs? How much > memstore are you giving? > > > > On Thu, Jan 16, 2014 at 1:17 PM, Andrew Purtell <[email protected]> > wrote: > > > On Wed, Jan 15, 2014 at 5:32 PM, > > Vladimir Rodionov <[email protected]> wrote: > > > > > Yes, I am using ephemeral (local) storage. I found that iostat is most > of > > > the time idle on 3K load with periodic bursts up to 10% iowait. > > > > > > > Ok, sounds like the problem is higher up the stack. > > > > I see in later emails on this thread a log snippet that shows an issue > with > > the WAL writer pipeline, one of the datanodes is slow, sick, or partially > > unreachable. If you have uneven point to point ping times among your > > cluster instances, or periodic loss, it might still be AWS's fault, > > otherwise I wonder why the DFSClient says a datanode is sick. > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > >
