The previous fix did not improve anything on 2-miniute-timeout.

On Wed, Sep 5, 2018 at 10:52 AM, Anthony Baker <aba...@pivotal.io> wrote:

> Gester,
>
> Clearly the prior implementation had some problems, but except in
> pathological cases it provided the behavior users expected.  That’s why I
> think we need a characterization test(s) to show exactly what we want the
> behavior to be.  Merging in changes that make the user experience worse in
> the more common scenarios isn’t a good tradeoff IMO.  I see this work as
> integral to GEODE-5591 and shouldn’t be deferred to a separate ticket.
>
> Anthony
>
>
> > On Sep 5, 2018, at 10:43 AM, Xiaojian Zhou <gz...@pivotal.io> wrote:
> >
> > The fix intend to resolve 2 issues:
> > 1) change the exception handling (for a linux version).
> > 2) prevent random picking port number to loop forever. In old code, for
> > example, if the range only contains one port, random will always pick the
> > same port and it will loop forever. The fix will stop after all available
> > ports in the range are tried. There's a test
> >
> > test_ValidateGatewayReceiverAttributes_WrongBindAddress
> >
> >
> > For 2-minute-wait, it's still possible. The fix did not resolve it
> > (when random() happened to return same port for different receiver in
> > the same member), but I did not make things worse either.
> >
> >
> > There's discussion on if we can reduce the 2-minute-timeout to a few
> > second. This is definitely another ticket.
> >
> > Regards
> >
> > Gester
> >
> >
> > On Wed, Sep 5, 2018 at 10:35 AM, Anthony Baker <aba...@pivotal.io>
> wrote:
> >
> >> Before this improvement is re-merged I’d like to see:
> >>
> >> 1) A test that characterizes the current behavior (e.g. doesn’t wait 2
> min
> >> when there’s a port conflict)
> >> 2) A test that demonstrates how the current logic is insufficient
> >>
> >> Anthony
> >>
> >>
> >>> On Sep 5, 2018, at 10:20 AM, Nabarun Nag <n...@apache.org> wrote:
> >>>
> >>> GEODE-5591 has been reverted in develop
> >>> ref: 901da27f227a8ce2b7d6b681619782a1accd9330
> >>>
> >>> Regards
> >>> Nabarun Nag
> >>>
> >>> On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon <rmcma...@pivotal.io>
> >> wrote:
> >>>
> >>>> +1 for reverting in both places.
> >>>>
> >>>> I see that there is already an isGatewayReceiver flag in the
> >> AcceptorImpl
> >>>> constructor.  It's not ideal, but could we use this flag to prevent
> the
> >> 2
> >>>> minute retry logic for happening if this flag is true?
> >>>>
> >>>> Ryan
> >>>>
> >>>> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey <
> >>>> lhughesgodf...@pivotal.io> wrote:
> >>>>
> >>>>> +1 for reverting in both places.
> >>>>>
> >>>>> On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith <dsm...@pivotal.io> wrote:
> >>>>>
> >>>>>> +1 for reverting in both places. The current fix is not better,
> that's
> >>>>> why
> >>>>>> we are reverting it on the release branch!
> >>>>>>
> >>>>>> -Dan
> >>>>>>
> >>>>>> On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett <jbarr...@pivotal.io>
> >>>>> wrote:
> >>>>>>
> >>>>>>> I’m not ok with reverting in develop. Revert in 1.7 and modify in
> >>>>>> develop.
> >>>>>>> We shouldn’t go backwards in develop. The current fix is better
> than
> >>>>> the
> >>>>>>> bug it fixes.
> >>>>>>>
> >>>>>>>> On Sep 5, 2018, at 9:40 AM, Nabarun Nag <n...@apache.org> wrote:
> >>>>>>>>
> >>>>>>>> If everyone is okay with it, I will revert that change in develop
> >>>> and
> >>>>>>> then
> >>>>>>>> cherry pick it to release/1.7.0 branch.
> >>>>>>>> Please do comment.
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Nabarun Nag
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith <dsm...@pivotal.io>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>> +1 to yank it and rework the fix.
> >>>>>>>>>
> >>>>>>>>> Gester's change helps, but it just means that you will sometimes
> >>>>>>> randomly
> >>>>>>>>> have a 2 minute delay starting up a gateway receiver. I don't
> >>>> think
> >>>>>>> that is
> >>>>>>>>> a great user experience either.
> >>>>>>>>>
> >>>>>>>>> -Dan
> >>>>>>>>>
> >>>>>>>>> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt <
> >>>>>>> bschucha...@pivotal.io>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Let's yank it
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On 9/4/18 5:04 PM, Sean Goller wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> If it's to get the release out, I'm fine with reverting. I
> don't
> >>>>>> like
> >>>>>>>>> it,
> >>>>>>>>>>> but I'm not willing to die on that hill. :)
> >>>>>>>>>>>
> >>>>>>>>>>> -S.
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith <dsm...@pivotal.io>
> >>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Spitting this into a separate thread.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I see the issue. The two minute timeout is the constructor for
> >>>>>>>>>>>> AcceptorImpl, where it retries to bind for 2 minutes.
> >>>>>>>>>>>>
> >>>>>>>>>>>> That behavior makes sense for CacheServer.start.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But it doesn't make sense for the new logic in
> >>>>>>> GatewayReceiver.start()
> >>>>>>>>>>>> from
> >>>>>>>>>>>> GEODE-5591. That code is trying to use CacheServer.start to
> >>>> scan
> >>>>>> for
> >>>>>>> an
> >>>>>>>>>>>> available port, trying each port in a range. That free port
> >>>>> finding
> >>>>>>>>> logic
> >>>>>>>>>>>> really doesn't want to have two minutes of retries for each
> >>>> port.
> >>>>>> It
> >>>>>>>>>>>> seems
> >>>>>>>>>>>> like we need to rework the fix for GEODE-5591.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Does it make sense to hold up the release to rework this fix,
> >>>> or
> >>>>>>> should
> >>>>>>>>>>>> we
> >>>>>>>>>>>> just revert it? Have we switched concourse over to using
> alpine
> >>>>>>> linux,
> >>>>>>>>>>>> which I think was the original motivation for this fix?
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Dan
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith <dsm...@pivotal.io>
> >>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Why is it waiting at all in this case? Where is this 2 minute
> >>>>>> timeout
> >>>>>>>>>>>>> coming from?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Dan
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
> >>>>>>>>>>>>>
> >>>>>>>>>>>> sai.boorlaga...@gmail.com
> >>>>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>> So the issue is that it takes longer to start than previous
> >>>>>>> releases?
> >>>>>>>>>>>>>> Also, is this wait time only when using Gfsh to create
> >>>>>>>>>>>>>> gateway-receiver?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag <n...@apache.org
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Currently we have a minor issue in the release branch as
> >>>>> pointed
> >>>>>>> out
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> by
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Barry O.
> >>>>>>>>>>>>>>> We will wait till a resolution is figured out for this
> >>>> issue.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Steps:
> >>>>>>>>>>>>>>> 1. create locator
> >>>>>>>>>>>>>>> 2. start server --name=server1 --server-port=40404
> >>>>>>>>>>>>>>> 3. start server --name=server2 --server-port=40405
> >>>>>>>>>>>>>>> 4. create gateway-receiver --member=server1
> >>>>>>>>>>>>>>> 5. create gateway-receiver --member=server2 `This gets
> stuck
> >>>>>> for 2
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> minutes`
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Is the 2 minute wait time acceptable? Should we document
> it?
> >>>>>> When
> >>>>>>> we
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> revert
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> GEODE-5591, this issue does not happen.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Regards
> >>>>>>>>>>>>>>> Nabarun Nag
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Reply via email to