JvmPauseMonitor is not in 0.94 branch. Maybe HBASE-9630 should be backported to 0.94 ?
Cheers On Fri, Jan 17, 2014 at 10:54 AM, Nick Dimiduk <[email protected]> wrote: > You can also grep your logs for entries from JvmPauseMonitor. When you're > trying to track down an anomaly, it can help you locate the ballpark of at > least when it happened. Correlate it back into your Ganglia, OpenTSDB, > CloudWatch, &c systems and look for more info. I think it makes a statement > about "this was probably long GC," but in EC2, it's just as often something > else. Use that as evidence for when you return to your AWS support rep and > argue for a refund. > > -n > > On Fri, Jan 17, 2014 at 7:03 AM, Michael Segel <[email protected] > >wrote: > > > I need to apologize and clarify this statement… > > > > First, running benchmarks on AWS is ok, if you’re attempting to get a > > rough idea of how HBase will perform on a certain class of machines and > > you’re comparing m1.large to m1.xlarge or m3.xlarge … so that you can > get a > > rough scale on sizing. > > > > However, in this thread, you’re talking about trying to figure out why a > > certain mechanism isn’t working. > > > > You’re trying to track down why writes stall when you’re working in a > > virtualized environment where not only do you not have control over the > > machines, but also the network and your storage. > > > > Also when you run the OS on a virtual machine, there are going to be > > ‘anomalies’ that you can’t explain because the OS is running within a VM > > and can only report what it sees, and not what could be happening > > underneath in the VM’s OS. > > > > So you may see a problem, but will never be able to find the cause. > > > > > > On Jan 17, 2014, at 5:55 AM, Michael Segel <[email protected]> > > wrote: > > > > > Guys, > > > > > > Trying to benchmark on AWS is a waste of time. You end up chasing > ghosts. > > > You want to benchmark, you need to isolate your systems to reduce > > extraneous factors. > > > > > > You need real hardware, real network in a controlled environment. > > > > > > > > > Sent from a remote device. Please excuse any typos... > > > > > > Mike Segel > > > > > >> On Jan 16, 2014, at 12:34 PM, "Bryan Beaudreault" < > > [email protected]> wrote: > > >> > > >> This might be better on the user list? Anyway.. > > >> > > >> How many IPC handlers are you giving? m1.xlarge is very low cpu. Not > > only > > >> does it have only 4 cores (more cores allow more concurrent threads > with > > >> less context switching), but those cores are severely underpowered. I > > >> would recommend at least c1.xlarge, which is only a bit more > expensive. > > If > > >> you happen to be doing heavy GC, with 1-2 compactions running, and > with > > >> many writes incoming, you are quickly using up quite a bit of CPU. > > What is > > >> the load and CPU usage, on the 10.38.106.234:50010? > > >> > > >> Did you see anything about blocking updates in the hbase logs? How > much > > >> memstore are you giving? > > >> > > >> > > >>> On Thu, Jan 16, 2014 at 1:17 PM, Andrew Purtell <[email protected] > > > > wrote: > > >>> > > >>> On Wed, Jan 15, 2014 at 5:32 PM, > > >>> Vladimir Rodionov <[email protected]> wrote: > > >>> > > >>>> Yes, I am using ephemeral (local) storage. I found that iostat is > > most of > > >>>> the time idle on 3K load with periodic bursts up to 10% iowait. > > >>> > > >>> Ok, sounds like the problem is higher up the stack. > > >>> > > >>> I see in later emails on this thread a log snippet that shows an > issue > > with > > >>> the WAL writer pipeline, one of the datanodes is slow, sick, or > > partially > > >>> unreachable. If you have uneven point to point ping times among your > > >>> cluster instances, or periodic loss, it might still be AWS's fault, > > >>> otherwise I wonder why the DFSClient says a datanode is sick. > > >>> > > >>> -- > > >>> Best regards, > > >>> > > >>> - Andy > > >>> > > >>> Problems worthy of attack prove their worth by hitting back. - Piet > > Hein > > >>> (via Tom White) > > >>> > > > > > > > >
