hi, folks, I encountered a problem last week that my cluster was replicating to itself. The problem was discussed and described here: http://mail-archives.apache.org/mod_mbox/hbase-dev/201311.mbox/%3CCAOEq2C5%2ByqsewgzpkPO%3D8FRvQVv6tqOoo5wrdxDVN%2BzRzV1iWw%40mail.gmail.com%3E
Finally, we found the problem was introduced because someone copied ./zookeeper/conf/zoo.cfg to ./hbase/conf. While the problem was fix, we are still wondering why replication code pick up the zoo.cfg? Can someone throw some light here? To briefly sum up about how to recreate the problem, and what's wrong (btw, I am using hbase 0.94.9): 1) to recreate: copy the zoo.cfg under hbase/conf, and that is 2) the regionsever (of source Cluster) log will show something like: 2013-11-04 14:07:30,364 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating *d59d578a-3276-474a-9f15-17fc08a4415e* -> *d59d578a-3276-474a-9f15-17fc08a4415e* the first value is correct (hbase.id of the source), but the 2nd should be the hbase.id of peer. And it turns out, the replication logic didn't get the correct one I did some investigation, and was able to trace back to ReplicationPeer.zkw.quorum was set incorrectly (with zoo.cfg under ./hbase/conf). As odd as this is, I feel this may be a bug of replication logic. As we should get the peer.zkw.quorum as the way 'add_peer' provides, instead of retrieving it from another place, even in this case the zoo.cfg should not be copied under ./hbase/conf. Any ideas ? many thanks Demai