> Unfortunately that is also our achilles' heel, we are far from being Java > experts and it will probably take us a lot of time to become experts so we > can debug and fix problems like you do.
Well we go through most of the problems, unless you use HBase in ways very different from ours (like 10x the machine and load), then you might hit issues. But, we are very helpful and usually fix issues in a very timely fashion. We are also always available in our IRC channel, #hbase on freenode. > My thinking was to build two independent clusters with cyclic replication so > if one crashes we can switch > to the other one while we figure out how to fix the first. However, doing > that requires solid replication capabilities. Can I understand from your > description that you have cyclic, selective replication working in > production already? I see that it's scheduled to be released on 0.21, is it > possible to get it to work on 0.20? It will be in 0.21, it's not production ready (we are using it on our processing cluster which, as you can guess, runs on hadoop 0.21 and hbase trunk... both unreleased). For 0.20... we'll see, depends on other stuff going on currently. The cyclic stuff still misses some parts (not that bad), selective replication is done on the family level and it's currently working. > > As for the issue with shutting down the master node, what I see is that > running "hbase-daemon.sh stop master" continues printing dots forever. > Looking at the code for that script, it is trying to run "./hbase master > stop". If I run that command manually it seems to ignore the stop parameter > and trying to load another instance of the server which fails in my case > because the server is already running and the JMX port is busy. There is > nothing in the log and the out file only has the exception thrown by the JMX > trying to bind to the busy socket. That is the problem. That bind happens when the JVM boots and makes it exit, so it really doesn't stop the master at all. 0.20.3 has a fix for that. > > Thanks again, I really appreciate the information. > > -eran