Re: HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)

Ted Yu Fri, 17 Jan 2014 11:12:25 -0800

JvmPauseMonitor is not in 0.94 branch.

Maybe HBASE-9630 should be backported to 0.94 ?


Cheers


On Fri, Jan 17, 2014 at 10:54 AM, Nick Dimiduk <[email protected]> wrote:

> You can also grep your logs for entries from JvmPauseMonitor. When you're
> trying to track down an anomaly, it can help you locate the ballpark of at
> least when it happened. Correlate it back into your Ganglia, OpenTSDB,
> CloudWatch, &c systems and look for more info. I think it makes a statement
> about "this was probably long GC," but in EC2, it's just as often something
> else. Use that as evidence for when you return to your AWS support rep and
> argue for a refund.
>
> -n
>
> On Fri, Jan 17, 2014 at 7:03 AM, Michael Segel <[email protected]
> >wrote:
>
> > I need to apologize and clarify this statement…
> >
> > First, running benchmarks on AWS is ok, if you’re attempting to get a
> > rough idea of how HBase will perform on a certain class of machines and
> > you’re comparing m1.large to m1.xlarge or m3.xlarge … so that you can
> get a
> > rough scale on sizing.
> >
> > However, in this thread, you’re talking about trying to figure out why a
> > certain mechanism isn’t working.
> >
> > You’re trying to track down why writes stall when you’re working in a
> > virtualized environment where not only do you not have control over the
> > machines, but also the network and your storage.
> >
> > Also when you run the OS on a virtual machine, there are going to be
> > ‘anomalies’ that you can’t explain because the OS is running within a VM
> > and can only report what it sees, and not what could be happening
> > underneath in the VM’s OS.
> >
> > So you may see a problem, but will never be able to find the cause.
> >
> >
> > On Jan 17, 2014, at 5:55 AM, Michael Segel <[email protected]>
> > wrote:
> >
> > > Guys,
> > >
> > > Trying to benchmark on AWS is a waste of time. You end up chasing
> ghosts.
> > > You want to benchmark, you need to isolate your systems to reduce
> > extraneous factors.
> > >
> > > You need real hardware, real network in a controlled environment.
> > >
> > >
> > > Sent from a remote device. Please excuse any typos...
> > >
> > > Mike Segel
> > >
> > >> On Jan 16, 2014, at 12:34 PM, "Bryan Beaudreault" <
> > [email protected]> wrote:
> > >>
> > >> This might be better on the user list? Anyway..
> > >>
> > >> How many IPC handlers are you giving?  m1.xlarge is very low cpu.  Not
> > only
> > >> does it have only 4 cores (more cores allow more concurrent threads
> with
> > >> less context switching), but those cores are severely underpowered.  I
> > >> would recommend at least c1.xlarge, which is only a bit more
> expensive.
> >  If
> > >> you happen to be doing heavy GC, with 1-2 compactions running, and
> with
> > >> many writes incoming, you are quickly using up quite a bit of CPU.
> >  What is
> > >> the load and CPU usage, on the 10.38.106.234:50010?
> > >>
> > >> Did you see anything about blocking updates in the hbase logs?  How
> much
> > >> memstore are you giving?
> > >>
> > >>
> > >>> On Thu, Jan 16, 2014 at 1:17 PM, Andrew Purtell <[email protected]
> >
> > wrote:
> > >>>
> > >>> On Wed, Jan 15, 2014 at 5:32 PM,
> > >>> Vladimir Rodionov <[email protected]> wrote:
> > >>>
> > >>>> Yes, I am using ephemeral (local) storage. I found that iostat is
> > most of
> > >>>> the time idle on 3K load with periodic bursts up to 10% iowait.
> > >>>
> > >>> Ok, sounds like the problem is higher up the stack.
> > >>>
> > >>> I see in later emails on this thread a log snippet that shows an
> issue
> > with
> > >>> the WAL writer pipeline, one of the datanodes is slow, sick, or
> > partially
> > >>> unreachable. If you have uneven point to point ping times among your
> > >>> cluster instances, or periodic loss, it might still be AWS's fault,
> > >>> otherwise I wonder why the DFSClient says a datanode is sick.
> > >>>
> > >>> --
> > >>> Best regards,
> > >>>
> > >>>  - Andy
> > >>>
> > >>> Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > >>> (via Tom White)
> > >>>
> > >
> >
> >
>

Re: HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)

Reply via email to