This was the JIRA that introduced copyQueuesFromRSUsingMulti(): HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu Vashishtha)
It went into 0.94.5 And the feature is off by default: <name>hbase.zookeeper.useMulti</name> <value>false</value> The fact that Lars first reported the following problem meant that no other user tried this feature. Hence I think 0.94.6 RC1 doesn't need to be sunk. Cheers On Wed, Mar 13, 2013 at 6:45 PM, lars hofhansl <la...@apache.org> wrote: > Hey no problem. It's cool that we found it in a test env. It's probably > quite hard to reproduce. > This is in 0.94.5 but this feature is off by default. > > What's the general thought here, should I kill the current 0.94.6 rc for > this? > My gut says: Yes. > > > I'm also a bit worried about these: > 2013-03-14 01:42:42,271 DEBUG > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening > log for replication > shared-dnds1-12-sfm.ops.sfdc.net%2C60020%2C1363220608780.1363220609572 > at 0 > 2013-03-14 01:42:42,358 WARN > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 1 Got: > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1800) > at > org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:177) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:728) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:67) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:507) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:313) > 2013-03-14 01:42:42,358 WARN > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Waited > too long for this file, considering dumping > 2013-03-14 01:42:42,358 DEBUG > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unable > to open a reader, sleeping 1000 times 10 > > This happens after bouncing the cluster a 2nd time and these messages > repeat every 10s (for hours now). This is a separate problem I think. > > -- Lars > > ------------------------------ > *From:* Himanshu Vashishtha <hvash...@cs.ualberta.ca> > > *To:* dev@hbase.apache.org; lars hofhansl <la...@apache.org> > *Cc:* Ted Yu <yuzhih...@gmail.com> > *Sent:* Wednesday, March 13, 2013 6:38 PM > > *Subject:* Re: Replication hosed after simple cluster restart > > This is bad. Yes, copyQueuesFromRSUsingMulti returns a list which it > might not be able to move later on, resulting in bogus znodes. > I'll fix this asap. Weird it didn't happen in my testing earlier. > Sorry about this. > > > On Wed, Mar 13, 2013 at 6:27 PM, lars hofhansl <la...@apache.org> wrote: > > Sorry 0.94.6RC1 > > (I complain about folks not reporting the version all the time, and then > I do it too) > > > > > > > > ________________________________ > > From: Ted Yu <yuzhih...@gmail.com> > > To: dev@hbase.apache.org; lars hofhansl <la...@apache.org> > > Sent: Wednesday, March 13, 2013 6:17 PM > > Subject: Re: Replication hosed after simple cluster restart > > > > > > Did this happen on 0.94.5 ? > > > > Thanks > > > > > > On Wed, Mar 13, 2013 at 6:12 PM, lars hofhansl <la...@apache.org> wrote: > > > > We just ran into an interesting scenario. We restarted a cluster that > was setup as a replication source. > >>The stop went cleanly. > >> > >>Upon restart *all* regionservers aborted within a few seconds with > variations of these errors: > >>http://pastebin.com/3iQVuBqS > >> > >>This is scary! > >> > >>-- Lars > > >