I need to apologize and clarify this statement… 

First, running benchmarks on AWS is ok, if you’re attempting to get a rough 
idea of how HBase will perform on a certain class of machines and you’re 
comparing m1.large to m1.xlarge or m3.xlarge … so that you can get a rough 
scale on sizing. 

However, in this thread, you’re talking about trying to figure out why a 
certain mechanism isn’t working.

You’re trying to track down why writes stall when you’re working in a 
virtualized environment where not only do you not have control over the 
machines, but also the network and your storage. 

Also when you run the OS on a virtual machine, there are going to be 
‘anomalies’ that you can’t explain because the OS is running within a VM and 
can only report what it sees, and not what could be happening underneath in the 
VM’s OS. 

So you may see a problem, but will never be able to find the cause. 


On Jan 17, 2014, at 5:55 AM, Michael Segel <msegel_had...@hotmail.com> wrote:

> Guys,
> 
> Trying to benchmark on AWS is a waste of time. You end up chasing ghosts.
> You want to benchmark, you need to isolate your systems to reduce extraneous 
> factors.
> 
> You need real hardware, real network in a controlled environment.
> 
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
>> On Jan 16, 2014, at 12:34 PM, "Bryan Beaudreault" <bbeaudrea...@hubspot.com> 
>> wrote:
>> 
>> This might be better on the user list? Anyway..
>> 
>> How many IPC handlers are you giving?  m1.xlarge is very low cpu.  Not only
>> does it have only 4 cores (more cores allow more concurrent threads with
>> less context switching), but those cores are severely underpowered.  I
>> would recommend at least c1.xlarge, which is only a bit more expensive.  If
>> you happen to be doing heavy GC, with 1-2 compactions running, and with
>> many writes incoming, you are quickly using up quite a bit of CPU.  What is
>> the load and CPU usage, on the 10.38.106.234:50010?
>> 
>> Did you see anything about blocking updates in the hbase logs?  How much
>> memstore are you giving?
>> 
>> 
>>> On Thu, Jan 16, 2014 at 1:17 PM, Andrew Purtell <apurt...@apache.org> wrote:
>>> 
>>> On Wed, Jan 15, 2014 at 5:32 PM,
>>> Vladimir Rodionov <vladrodio...@gmail.com> wrote:
>>> 
>>>> Yes, I am using ephemeral (local) storage. I found that iostat is most of
>>>> the time idle on 3K load with periodic bursts up to 10% iowait.
>>> 
>>> Ok, sounds like the problem is higher up the stack.
>>> 
>>> I see in later emails on this thread a log snippet that shows an issue with
>>> the WAL writer pipeline, one of the datanodes is slow, sick, or partially
>>> unreachable. If you have uneven point to point ping times among your
>>> cluster instances, or periodic loss, it might still be AWS's fault,
>>> otherwise I wonder why the DFSClient says a datanode is sick.
>>> 
>>> --
>>> Best regards,
>>> 
>>>  - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>> 
> 

Reply via email to