Not sure I follow. Is this our making use of multi against a zk ensemble that doesn't support it? On Mar 13, 2013 6:22 PM, "lars hofhansl" <[email protected]> wrote:
> I suppose the problem could be in > zkHelper.copyQueuesFromRSUsingMulti(rsZnode) as called from > ReplicationSourceManager.NodeFailoverWorker.run(). > copyQueuesFromRSUsingMulti will return the queues it read even when the > multi operation failed (because another RS managed to execute it first). > > -- Lars > > > > ________________________________ > From: lars hofhansl <[email protected]> > To: hbase-dev <[email protected]> > Sent: Wednesday, March 13, 2013 6:12 PM > Subject: Replication hosed after simple cluster restart > > We just ran into an interesting scenario. We restarted a cluster that was > setup as a replication source. > The stop went cleanly. > > Upon restart *all* regionservers aborted within a few seconds with > variations of these errors: > http://pastebin.com/3iQVuBqS > > This is scary! > > -- Lars
