In general, the connection times are fairly randomly distributed, but not always. One case I have seen is a update of the server. We spin up a new replica of the server, and it takes all connections. As a result, another replica spins up shortly after due to autoscaling. This new replica takes no connections for 30m.
As far as tuning the autoscaling, that is a good point, I will look into that. On Wednesday, October 16, 2019 at 10:55:20 AM UTC-7, Mark D. Roth wrote: > > Do your clients all start up at the same time? If not, it's not clear to > me why your setup wouldn't work. If the clients' start times are randomly > distributed, then if the server closes each connection after 30m, the > connection close times should be just as randomly distributed as the client > start times, which means that as soon as the new server comes up, clients > should start trickling into it. It may take 30m for the load to fully > balance, but the new server should start getting new load immediately, and > the load should increase slowly over that 30m period. > > I don't know anything about the Kubernetes autoscaling side of things, but > maybe there are parameters you can tune there to give the new server more > time to accumulate load before Kubernetes kills it? > > In general, it's not clear to me that there's a better approach than the > one you're already taking. There's always a bit of tension between load > balancing and streaming RPCs, because the whole point of a streaming RPC is > that it doesn't go through load balancing for each individual message, > which means that all of the messages go to the same backend. > > I hope this information is helpful. > > On Mon, Oct 7, 2019 at 3:54 PM howardjohn via grpc.io < > grp...@googlegroups.com <javascript:>> wrote: > >> We have a case where we have many clients and few servers, typically >> 1000:1 ratio. The traffic is a single bidirectional stream per client. >> >> The problem we are seeing is that when a new server comes up, it will >> have no clients connected, as they maintain their connection to the other >> servers. >> >> This is made worse by Kubernetes autoscaling, as this new server will >> have 0 load it will scale down and we flip flop between n and n+1 replicas. >> This graph shows this behavior pretty well: >> https://snapshot.raintank.io/dashboard/snapshot/SceOCrNpdOr4qmTUk1UHF20xMiNqGk6K?panelId=4&fullscreen&orgId=2 >> >> As a mitigation against this, we have the server close the connections >> every 30m. This is not great, because it takes a least 30 min to balance, >> and due to the above issue this generally doesn't ever work. >> >> >> I am wondering if there are any best practices for handling this type of >> problem? >> >> One possible idea we have is the server sharing load information and >> shedding load if they have more than their "fair share" of connections, but >> this is pretty complex >> >> -- >> You received this message because you are subscribed to the Google Groups >> "grpc.io" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to grp...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/grpc-io/3423f295-a2fe-4096-b003-fc09c605987e%40googlegroups.com >> >> <https://groups.google.com/d/msgid/grpc-io/3423f295-a2fe-4096-b003-fc09c605987e%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > Mark D. Roth <ro...@google.com <javascript:>> > Software Engineer > Google, Inc. > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/1fa9e2fc-3dd3-45e5-b656-13337e170cb1%40googlegroups.com.