Hi folks,
I am thinking of building a testing environment for a HBase cluster on EC2, and 
I plan to build such an environment for the following reasons:
1) To have a reference throughput/read_latency number for different size of 
HBase cluster.2) To test various schema design and its performance implication 
to scan and M/R operation.
-- After having result from 1 and 2, we can decide how to build actual physical 
cluster. The reason that we don't want to build physical cluster at the first 
place is because I understand that building a 4 nodes cluster does not make too 
much sense for real load test (we do have a rough estimation of how big our 
data size will be).-- At the same time, I hope I can have got enough 
high-availability solution during our experimenting on 1 and 2. 
Having said my motivation of this experiment, I'd like ask several questions:
a) After reading http://aws.amazon.com/ec2/instance-types/, I believe I should 
select "Standard Instances: Extra Large Instance" as my instance. Though it 
seems that I should pick "High-Memory Instances" family because we are talking 
about memory hungry application here, "High-Memory Instances" probably does not 
fit my testing environment -- the disk space does not look like a good number. 
Note: after the testing at this environment, I will need to use the benchmark 
number as a reference to build my actual cluster.

b) I understand Cloudera provides an AMI, but can I build my own? If I can 
choose to do so, can someone give me a pointer? I have successfully built an 
HBase server on a 4 machine cluster, how much further effort (please give me an 
estimate if you would) need I put to achieve this goal?
c) Here is my testing environment:   -- I build an HBase cluster for serving   
-- then I build several clients for issuing work-load opsHow can I get to learn 
the high-availability lessons around this (I know most of the high-level ideas, 
but all subtle issues come from implementation details as we all know, 
especially for a distributed system)


Thanks for any suggestion!



                                          
_________________________________________________________________
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3

Reply via email to