I have been fan of hbase for awhile, but until now I haven't had any extra hardware to setup and run an instance. Now I'm trying to decide what would be the most ideal setup.
I have a 64 node hadoop/hive setup, each node has dual quad core processors with 32 Gig of ram and 4 T of storage. Now my options are, to run a 64 way hbase setup on those nodes, or possible run hbase on a separate set of machines up to 16 nodes of the same type, but they would only be used for hbase. I'm leaning toward running hbase on the 64 way cluster with hadoop, because I'm going to be using hbase in some map reduce jobs and for the size. What I'm planning on doing with the cluster: - Migrate some large berkeley dbs to hbase (15 - 20 billion records) - Mix some live data from hbase with some batch processing in hive (small amount of data) - Build a large graph db on top of hbase (size unknown, billions at least) - Probably a lot more things as time goes along Thoughts and opinions welcome. Thanks! Aaron