Before this improvement is re-merged I’d like to see: 1) A test that characterizes the current behavior (e.g. doesn’t wait 2 min when there’s a port conflict) 2) A test that demonstrates how the current logic is insufficient
Anthony > On Sep 5, 2018, at 10:20 AM, Nabarun Nag <n...@apache.org> wrote: > > GEODE-5591 has been reverted in develop > ref: 901da27f227a8ce2b7d6b681619782a1accd9330 > > Regards > Nabarun Nag > > On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon <rmcma...@pivotal.io> wrote: > >> +1 for reverting in both places. >> >> I see that there is already an isGatewayReceiver flag in the AcceptorImpl >> constructor. It's not ideal, but could we use this flag to prevent the 2 >> minute retry logic for happening if this flag is true? >> >> Ryan >> >> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey < >> lhughesgodf...@pivotal.io> wrote: >> >>> +1 for reverting in both places. >>> >>> On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith <dsm...@pivotal.io> wrote: >>> >>>> +1 for reverting in both places. The current fix is not better, that's >>> why >>>> we are reverting it on the release branch! >>>> >>>> -Dan >>>> >>>> On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett <jbarr...@pivotal.io> >>> wrote: >>>> >>>>> I’m not ok with reverting in develop. Revert in 1.7 and modify in >>>> develop. >>>>> We shouldn’t go backwards in develop. The current fix is better than >>> the >>>>> bug it fixes. >>>>> >>>>>> On Sep 5, 2018, at 9:40 AM, Nabarun Nag <n...@apache.org> wrote: >>>>>> >>>>>> If everyone is okay with it, I will revert that change in develop >> and >>>>> then >>>>>> cherry pick it to release/1.7.0 branch. >>>>>> Please do comment. >>>>>> >>>>>> Regards >>>>>> Nabarun Nag >>>>>> >>>>>> >>>>>>> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith <dsm...@pivotal.io> >> wrote: >>>>>>> >>>>>>> +1 to yank it and rework the fix. >>>>>>> >>>>>>> Gester's change helps, but it just means that you will sometimes >>>>> randomly >>>>>>> have a 2 minute delay starting up a gateway receiver. I don't >> think >>>>> that is >>>>>>> a great user experience either. >>>>>>> >>>>>>> -Dan >>>>>>> >>>>>>> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt < >>>>> bschucha...@pivotal.io> >>>>>>> wrote: >>>>>>> >>>>>>>> Let's yank it >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On 9/4/18 5:04 PM, Sean Goller wrote: >>>>>>>>> >>>>>>>>> If it's to get the release out, I'm fine with reverting. I don't >>>> like >>>>>>> it, >>>>>>>>> but I'm not willing to die on that hill. :) >>>>>>>>> >>>>>>>>> -S. >>>>>>>>> >>>>>>>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith <dsm...@pivotal.io> >>> wrote: >>>>>>>>> >>>>>>>>> Spitting this into a separate thread. >>>>>>>>>> >>>>>>>>>> I see the issue. The two minute timeout is the constructor for >>>>>>>>>> AcceptorImpl, where it retries to bind for 2 minutes. >>>>>>>>>> >>>>>>>>>> That behavior makes sense for CacheServer.start. >>>>>>>>>> >>>>>>>>>> But it doesn't make sense for the new logic in >>>>> GatewayReceiver.start() >>>>>>>>>> from >>>>>>>>>> GEODE-5591. That code is trying to use CacheServer.start to >> scan >>>> for >>>>> an >>>>>>>>>> available port, trying each port in a range. That free port >>> finding >>>>>>> logic >>>>>>>>>> really doesn't want to have two minutes of retries for each >> port. >>>> It >>>>>>>>>> seems >>>>>>>>>> like we need to rework the fix for GEODE-5591. >>>>>>>>>> >>>>>>>>>> Does it make sense to hold up the release to rework this fix, >> or >>>>> should >>>>>>>>>> we >>>>>>>>>> just revert it? Have we switched concourse over to using alpine >>>>> linux, >>>>>>>>>> which I think was the original motivation for this fix? >>>>>>>>>> >>>>>>>>>> -Dan >>>>>>>>>> >>>>>>>>>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith <dsm...@pivotal.io> >>>> wrote: >>>>>>>>>> >>>>>>>>>> Why is it waiting at all in this case? Where is this 2 minute >>>> timeout >>>>>>>>>>> coming from? >>>>>>>>>>> >>>>>>>>>>> -Dan >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda < >>>>>>>>>>> >>>>>>>>>> sai.boorlaga...@gmail.com >>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> So the issue is that it takes longer to start than previous >>>>> releases? >>>>>>>>>>>> Also, is this wait time only when using Gfsh to create >>>>>>>>>>>> gateway-receiver? >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag <n...@apache.org> >>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Currently we have a minor issue in the release branch as >>> pointed >>>>> out >>>>>>>>>>>>> >>>>>>>>>>>> by >>>>>>>>>> >>>>>>>>>>> Barry O. >>>>>>>>>>>>> We will wait till a resolution is figured out for this >> issue. >>>>>>>>>>>>> >>>>>>>>>>>>> Steps: >>>>>>>>>>>>> 1. create locator >>>>>>>>>>>>> 2. start server --name=server1 --server-port=40404 >>>>>>>>>>>>> 3. start server --name=server2 --server-port=40405 >>>>>>>>>>>>> 4. create gateway-receiver --member=server1 >>>>>>>>>>>>> 5. create gateway-receiver --member=server2 `This gets stuck >>>> for 2 >>>>>>>>>>>>> >>>>>>>>>>>> minutes` >>>>>>>>>>>> >>>>>>>>>>>>> Is the 2 minute wait time acceptable? Should we document it? >>>> When >>>>> we >>>>>>>>>>>>> >>>>>>>>>>>> revert >>>>>>>>>>>> >>>>>>>>>>>>> GEODE-5591, this issue does not happen. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards >>>>>>>>>>>>> Nabarun Nag >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> >>