Raul, I'd need some time to dig into ZK-832, but from the description I don't think it is a blocker. As I understand this is happening because the server lost persistent state, and this is isn't a common scenario in a replicated deployment. I'm fine with downgrading it from blocker to major/critical.
As for ZK-1029, if we know what the problem is, would be difficult to provide a patch? -Flavio > On 22 Oct 2015, at 21:57, Raúl Gutiérrez Segalés <[email protected]> wrote: > > On 5 October 2015 at 11:01, Raúl Gutiérrez Segalés <[email protected] > <mailto:[email protected]>> > wrote: > >> On 8 September 2015 at 23:15, Raúl Gutiérrez Segalés <[email protected]> >> wrote: >> >>> Hi, >>> >>> On 23 August 2015 at 14:51, Raúl Gutiérrez Segalés <[email protected]> >>> wrote: >>> >>>> On 23 August 2015 at 14:44, Raúl Gutiérrez Segalés <[email protected]> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> sorry about dropping the ball here. So going over the unresolved >>>>> issues, I think these ones would be nice to tackle before cutting an RC: >>>>> >>>>> * ZOOKEEPER-1833: fix windows build (one sub-task still opened: >>>>> ZOOKEEPER-1868) >>>>> * ZOOKEEPER-1029: C client bug in zookeeper_init (if bad hostname is >>>>> given) >>>>> (no one has this assigned, I'll try to get a patch out by tomorrow) >>>>> * ZOOKEEPER-832: Invalid session id causes infinite loop during >>>>> automatic reconnect >>>>> (I've asked Rakesh if can wrap it up, if anyone else can help that >>>>> would be great) >>>>> * ZOOKEEPER-2033: zookeeper follower fails to start after a restart >>>>> immediately following a new epoch >>>>> (pinged Flavio to get some feedback) >>>>> >>>>> Everything else can probably be punted for 3.4.8, unless anyone >>>>> disagrees. >>>>> >>>> >>>> One more, which needs to be back-ported from trunk: >>>> >>>> ZOOKEEPER-1506: Re-try DNS hostname -> IP resolution if node connection >>>> fails >>>> >>> >>> There's been some movement in the bug tracker, but ZOOKEEPER-1506 and >>> ZOOKEEPER-832 >>> still need reviews (hopefully tomorrow, unless someone can beat me to it) >>> and I still need to get to ZOOKEEPER-1029. >>> >> >> So ZOOKEEPER-1506 is done. Still waiting on ZOOKEEPER-832 and I am hoping >> to finally get to ZOOKEEPER-1029 this week (unless someone beats me to it, >> which would be much appreciated). >> > > > Circling back, it turns out that ZOOKEEPER-1029 is actually not the cause > for MESOS-2186. The fact that we are not properly checking if the locks > have been initialized before trying to get the locks is still wrong, but > ignoring the return codes from pthread_cond_broadcast and > pthread_mutex_lock (EINVAL) is not causing the reported crashers. > > I propose we punt ZOOKEEPER-1029 and ZOOKEEPER-832 for 3.4.8, so that we > can keep moving with the release candidate. > > Any objections? > > > -rgs
