Hi folks,
I am thinking of building a testing environment for a HBase cluster on EC2, and
I plan to build such an environment for the following reasons:
1) To have a reference throughput/read_latency number for different size of
HBase cluster.2) To test various schema design and its performance implication
to scan and M/R operation.
-- After having result from 1 and 2, we can decide how to build actual physical
cluster. The reason that we don't want to build physical cluster at the first
place is because I understand that building a 4 nodes cluster does not make too
much sense for real load test (we do have a rough estimation of how big our
data size will be).-- At the same time, I hope I can have got enough
high-availability solution during our experimenting on 1 and 2.
Having said my motivation of this experiment, I'd like ask several questions:
a) After reading http://aws.amazon.com/ec2/instance-types/, I believe I should
select "Standard Instances: Extra Large Instance" as my instance. Though it
seems that I should pick "High-Memory Instances" family because we are talking
about memory hungry application here, "High-Memory Instances" probably does not
fit my testing environment -- the disk space does not look like a good number.
Note: after the testing at this environment, I will need to use the benchmark
number as a reference to build my actual cluster.
b) I understand Cloudera provides an AMI, but can I build my own? If I can
choose to do so, can someone give me a pointer? I have successfully built an
HBase server on a 4 machine cluster, how much further effort (please give me an
estimate if you would) need I put to achieve this goal?
c) Here is my testing environment: -- I build an HBase cluster for serving
-- then I build several clients for issuing work-load opsHow can I get to learn
the high-availability lessons around this (I know most of the high-level ideas,
but all subtle issues come from implementation details as we all know,
especially for a distributed system)
Thanks for any suggestion!
_________________________________________________________________
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3