It looks like python does not have the API to set wait_for_ready :-( On Friday, August 17, 2018 at 5:17:43 PM UTC-7, Srini Polavarapu wrote: > > Hi Alysha, > > How did you confirm that client is going into backoff and it is indeed > receiving a RST when nginx goes away? Have you looked at the logs gRPC > generates when this happens? One possibility is that nginx doesn't send RST > and client doesn't know that the connection is broken until TCP timeout > occurs. Using keepalive will help in this case. > > You can try using wait_for_ready=false > <https://github.com/grpc/grpc/blob/5098508d2d41a116113f7e333c516cd9ef34a943/doc/wait-for-ready.md> > so > the call fails immediately and you can retry. > > A recent PR allows you to reset the backoff period. > https://github.com/grpc/grpc/pull/16225. It is experimental and doesn't > have python or ruby API so it can't be of immediate help. > > On Friday, August 17, 2018 at 12:58:12 PM UTC-7, alysha....@shopify.com > wrote: >> >> Hey Carl, >> >> This is with L7 nginx balancing, the reason we moved to nginx from L4 >> balancers was so we could do per-call balancing (instead of per-connection >> with L7). >> >> > In an ideal world, nginx would send a GOAWAY frame to both the client >> and the server, and allow all the RPCs to complete before tearing down the >> connection. >> >> I agree a GOAWAY would be better but it seems like nginx doesn't do that >> (at least yet), they just RST the connection :( >> >> > The client knows how to reschedule and unstarted RPC onto a different >> connection, without returning an UNAVAILABLE. >> >> Even when we were using L4 it seemed like a GOAWAY from the Go server >> would put the Core clients in a backoff state instead of retrying >> immediately. The only solution that worked was a round-robin over multiple >> connections and a slow-enough rolling restart so the connections could >> re-establish before the next one died. >> >> > When you say multiple connections to a single IP, does that mean >> multiple nginx instances listening on different ports? >> >> No, it's a pool of ~20 ingress nginx instances with an L4 load balancer, >> so traffic looks like client -> L4 LB -> nginx L7 -> backend GRPC pod. The >> problem is the L4 LB in front of nginx has a single public IP. >> >> > I'm most familiar with Java, which can actually do what you want. The >> normal way is the create a custom NameResolver that returns multiple >> address for a single address, which a RoundRobin load balancer will use >> >> Yeah I considered writing something similar in Core but I was worried it >> wouldn't be adopted upstream because of the move to external LBs? It's very >> tough (impossible?) to add new resolvers to Ruby or Python without >> rebuilding the whole extension, and we're pretty worried about maintaining >> a fork of the C++ implementation. It's nice to hear the approach has some >> merits, I might experiment with it. >> >> Thanks, >> Alysha >> >> On Friday, August 17, 2018 at 3:42:31 PM UTC-4, Carl Mastrangelo wrote: >>> >>> Hi Alysha, >>> >>> Do you you know if nginx is balancing at L4 or L7? In an ideal world, >>> nginx would send a GOAWAY frame to both the client and the server, and >>> allow all the RPCs to complete before tearing down the connection. The >>> client knows how to reschedule and unstarted RPC onto a different >>> connection, without returning an UNAVAILABLE. >>> >>> When you say multiple connections to a single IP, does that mean >>> multiple nginx instances listening on different ports? >>> >>> I'm most familiar with Java, which can actually do what you want. The >>> normal way is the create a custom NameResolver that returns multiple >>> address for a single address, which a RoundRobin load balancer will use. >>> It sounds like you aren't using Java, but since the implementations are all >>> similar there may be a way to do so. >>> >>> On Friday, August 17, 2018 at 8:46:49 AM UTC-7, alysha....@shopify.com >>> wrote: >>>> >>>> Hi grpc people! >>>> >>>> We have a setup where we're running a grpc service (written in Go) on >>>> GKE, and we're accepting traffic from outside the cluster through nginx >>>> ingresses. Our clients are all using Core GRPC libraries (mostly Ruby) to >>>> make calls to the nginx ingress, which load-balances per-call to our >>>> backend pods. >>>> >>>> The problem we have with this setup is that whenever the nginx >>>> ingresses reload they drop all client connections, which results in spikes >>>> of Unavailable errors from our grpc clients. There are many nginx >>>> ingresses >>>> but they all have a single IP, the incoming TCP connections are routed >>>> through a google cloud L4 load balancer. Whenever an nginx . client closes >>>> a TCP connection the GRPC subchannel treats the backend as unavailable, >>>> even though there are many more nginx pods that may be available >>>> immediately to serve traffic, and it goes into backoff logic. My >>>> understanding is that with multiple subchannels even if one nginx ingress >>>> is restarted the others can continue to serve requests and we shouldn't >>>> see >>>> Unavailable errors. >>>> >>>> My question is: what is the best way to make GRPC Core establish >>>> multiple connections to a single IP, so we can have long-lived connections >>>> to multiple nginx ingresses? >>>> >>>> Possibilities we've considered: >>>> >>>> - DNS round-robin with multiple public IPs on a single A record - we've >>>> tested this and it works, but it requires us to manually administer the >>>> DNS >>>> records and run multiple L4 LBs >>>> >>>> - DNS SRV records - it seems like we could have multiple SRV records >>>> with the same hostname, but in my testing this requires us to add a >>>> look-aside load-balancer as well, and enable ares DNS which doesn't seem >>>> to >>>> be production-ready >>>> >>>> - Host a look-aside load-balancer - we could host our own LB service, >>>> but it's not clear to me how we would overcome this issue for the LB >>>> service? The LB would be behind the same nginx ingresses. I haven't found >>>> great documentation on how to set this up either. >>>> >>>> - Connection pooling in the client - wrapping the Ruby GRPC channels in >>>> a library that explicitly establishes multiple channels, each with one >>>> sub-channel. I've tried to write this but it's tricky to implement at a >>>> high level. I couldn't get it to perform as well during failures as the >>>> DNS >>>> round-robin approach. >>>> >>>> Are there options I missed? Is there any supported pattern for this? >>>> Has anyone deployed a similar architecture (many clients connecting >>>> through >>>> nginx on a single public IP)? >>>> >>>> Thanks, >>>> Alysha >>>> >>>
-- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To post to this group, send email to grpc-io@googlegroups.com. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/2e44ac66-fffd-4f42-b3ab-761266587194%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.