On Thu, Mar 4, 2010 at 6:02 PM, Eran Kutner <e...@gigya.com> wrote: > Hi, > I'm evaluating Hbase as a NoSql DB for a large scale, interactive, web > service with very high uptime requirements, and have a few questions to the > community. > > > 1. I assume you've seen this benchmark by Yahoo ( > http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf and > http://www.brianfrankcooper.net/pubs/ycsb.pdf). They show three main > problems: latency goes up quite significantly when doing more operations, > operations/sec are capped at about half of the other tested platforms and > adding new nodes interrupts the normal operation of the cluster for a > while. > Do you consider these results a problem and if so are there any plans to > address them? > 2. While running our tests (most were done using 0.20.2) we've had a few > incidents where a table went into "transition" without ever going out of > it. > We had to restart the cluster to release the stuck tables. Is this a > common > issue? > 3. If I understand correctly then any major upgrade requires completely > shutting down the cluster while doing the upgrade as well as deploying a > new > version of the application compiled with the new version client? Did I > get > it correctly? Is there any strategy for upgrading while the cluster is > still > running? > 4. This is more a bug report than a question but it seems that in 0.20.3 > the master server doesn't stop cleanly and has to be killed manually. Is > someone else seeing it too? > 5. Are there any performance benchmarks for the Thrift gateway? Do you > have an estimate of the performance penalty of using the gateway compared > to > using the native API? >
I also have concern about thrift gateway performance. Maybe I misunderstand but after I have a glance at the code of hbase thrift stuff, I find out that there is only one thrift server running. All requests are sent to the thrift server first and then forwarded to region servers. Is this a single failure point and a potential performance bottle neck? > 6. Right now, my biggest concern about HBase is its administration > complexity and cost. If anyone can share their experience that would be a > huge help. How many serves do you have in the cluster? How much ongoing > effort does it take to administrate it? What uptime levels are you seeing > (including upgrades)? Do you have any good strategy for running one > cluster > across two data centers, or replicating between two clusters in two > different DCs? Did you have any serious problems/crashes/downtime with > HBase? > > > Thanks a lot, > Eran Kutner > Best, - Hua