Not sure I follow. Is this our making use of multi against a zk ensemble that doesn't support it? On Mar 13, 2013 6:22 PM, "lars hofhansl" <la...@apache.org> wrote:
> I suppose the problem could be in > zkHelper.copyQueuesFromRSUsingMulti(rsZnode) as called from > ReplicationSourceManager.NodeFailoverWorker.run(). > copyQueuesFromRSUsingMulti will return the queues it read even when the > multi operation failed (because another RS managed to execute it first). > > -- Lars > > > > ________________________________ > From: lars hofhansl <la...@apache.org> > To: hbase-dev <dev@hbase.apache.org> > Sent: Wednesday, March 13, 2013 6:12 PM > Subject: Replication hosed after simple cluster restart > > We just ran into an interesting scenario. We restarted a cluster that was > setup as a replication source. > The stop went cleanly. > > Upon restart *all* regionservers aborted within a few seconds with > variations of these errors: > http://pastebin.com/3iQVuBqS > > This is scary! > > -- Lars