Hello, I was wondering what a minimal setup in terms of # of servers might be for HBase. Here is what I think is needed:
1 or 2 HBase master servers -- 1 or 2 dedicated boxes? 1 or more RegionServers -- 1 or more dedicated boxes? 1 or more Zookeepers -- 1 or more dedicated boxes? If running on HDFS, add: 1 or 2 NameNodes -- can this run on same box(es) as HBase master? 1 or more DataNodes -- can DNs be on same box(as) as RegionServers? If you want to run MR jobs on data in HBase, add: 1 or more JobTrackers -- can this run on the same box as HBase master and NN? 1 or more TaskTrackers -- can this run on the same box as RegionServer + DN? So, my main questions are: * Is it OK for HBase Master and NameNode (+JobTracker) to run on the same server? NN needs memory. What does HBase Master need the most? * Is it OK for RegionServer and DataNode (+TaskTracker) to run on the same server? (I think this is actually advised, so data is local?) I believe RegionMaster is a memory hungry (b/c of Memcache) process? I believe DNs need the CPU to run the MR jobs, and disk I/O, of course. * Finally, is the following correct? Non-HA system, with local disk: 1 HB master/NN/JT + 1 RegionServer/TT/DN + 1 ZK = 3 boxes HA HBase cluster with HDFS: 2 HB masters/NNs/JTs + 2 RegionServers/TTs/DNs + 2 ZKs = 6 boxes Thanks, Otis
