I need to apologize and clarify this statement… First, running benchmarks on AWS is ok, if you’re attempting to get a rough idea of how HBase will perform on a certain class of machines and you’re comparing m1.large to m1.xlarge or m3.xlarge … so that you can get a rough scale on sizing.
However, in this thread, you’re talking about trying to figure out why a certain mechanism isn’t working. You’re trying to track down why writes stall when you’re working in a virtualized environment where not only do you not have control over the machines, but also the network and your storage. Also when you run the OS on a virtual machine, there are going to be ‘anomalies’ that you can’t explain because the OS is running within a VM and can only report what it sees, and not what could be happening underneath in the VM’s OS. So you may see a problem, but will never be able to find the cause. On Jan 17, 2014, at 5:55 AM, Michael Segel <msegel_had...@hotmail.com> wrote: > Guys, > > Trying to benchmark on AWS is a waste of time. You end up chasing ghosts. > You want to benchmark, you need to isolate your systems to reduce extraneous > factors. > > You need real hardware, real network in a controlled environment. > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > >> On Jan 16, 2014, at 12:34 PM, "Bryan Beaudreault" <bbeaudrea...@hubspot.com> >> wrote: >> >> This might be better on the user list? Anyway.. >> >> How many IPC handlers are you giving? m1.xlarge is very low cpu. Not only >> does it have only 4 cores (more cores allow more concurrent threads with >> less context switching), but those cores are severely underpowered. I >> would recommend at least c1.xlarge, which is only a bit more expensive. If >> you happen to be doing heavy GC, with 1-2 compactions running, and with >> many writes incoming, you are quickly using up quite a bit of CPU. What is >> the load and CPU usage, on the 10.38.106.234:50010? >> >> Did you see anything about blocking updates in the hbase logs? How much >> memstore are you giving? >> >> >>> On Thu, Jan 16, 2014 at 1:17 PM, Andrew Purtell <apurt...@apache.org> wrote: >>> >>> On Wed, Jan 15, 2014 at 5:32 PM, >>> Vladimir Rodionov <vladrodio...@gmail.com> wrote: >>> >>>> Yes, I am using ephemeral (local) storage. I found that iostat is most of >>>> the time idle on 3K load with periodic bursts up to 10% iowait. >>> >>> Ok, sounds like the problem is higher up the stack. >>> >>> I see in later emails on this thread a log snippet that shows an issue with >>> the WAL writer pipeline, one of the datanodes is slow, sick, or partially >>> unreachable. If you have uneven point to point ping times among your >>> cluster instances, or periodic loss, it might still be AWS's fault, >>> otherwise I wonder why the DFSClient says a datanode is sick. >>> >>> -- >>> Best regards, >>> >>> - Andy >>> >>> Problems worthy of attack prove their worth by hitting back. - Piet Hein >>> (via Tom White) >>> >