I am still confused. The scenario you want to serve as an example is a) grpc clients --> TCP LB --> grpc servers; or b) grpc clients --> grpc servers?
On Mon, Dec 5, 2016 at 2:49 PM, Arya Asemanfar <arya.aseman...@mixpanel.com> wrote: > Sorry, I meant grpc server. Yes you are right if the TCP load balancer > restarts there is no problem, so my scenario only applies if the grpc > server restarts. > > On Mon, Dec 5, 2016 at 2:17 PM Qi Zhao <zh...@google.com> wrote: > >> On Mon, Dec 5, 2016 at 12:13 PM, Arya Asemanfar < >> arya.aseman...@mixpanel.com> wrote: >> >> Thanks for the feedback. Good idea re metadata for getting the Balancer >> to treat the connections as different. Will take a look at that. >> >> Some clarifications/questions inline: >> >> On Mon, Dec 5, 2016 at 11:11 AM, 'Qi Zhao' via grpc.io < >> grpc-io@googlegroups.com> wrote: >> >> Thanks for the info. My comments are inline. >> >> On Sun, Dec 4, 2016 at 7:27 PM, Arya Asemanfar < >> arya.aseman...@mixpanel.com> wrote: >> >> Hey all, >> >> We're considering implementing some patches to the golang grpc >> implementation. These are things we think would better fit inside of grpc >> rather than trying to achieve from outside. Before we go through the >> effort, we'd like to gauge whether these features would be welcome >> (assuming we'll work with owners to get a quality implementation). Some of >> these ideas are not fully fleshed out or may not be the best solution to >> the problem they aim to solve. I also try to state the problem, so if you >> have ideas on better ways to address these problems, please share :) >> >> *Add DialOption MaxConnectionLifetime* >> Currently, once a connection is established, it lives until there is a >> transport error or the client proactively closes the connection. These >> long-lived connections are problematic when using a TCP load balancer, such >> as the one provided by Google Container Engine and Google Compute Engine. >> At a a clean start, clients will be somewhat distributed among the servers >> behind the load balancer, but if the servers go through a rolling restart >> server will become unbalanced as clients will have a higher likelihood of >> being connected to the first server that restarts, with the most recently >> restarted server having close to zero clients. >> >> I do not think long-lived connections are problematic as long as there >> are live traffic on them. We do have plan to add idle shutdown to actively >> close the TCP connections which live long and have no traffic for a while. >> Which server to chose is really depending on the load balancing policy you >> choose -- I do not see why your description could happen if you use a >> round-robin load balance policy. >> >> >> We have a single IP address that we give to GRPC (since the IP address is >> Google Cloud's TCP load balancer). The client establishes one connection >> and has no reason to disconnect in normal conditions. >> >> Here's an example scenario that results in uneven load: >> - 100 clients connected evenly to 10 servers >> - each of the 10 servers has about 10 connect >> - each of the clients sends about an equal amount of traffic to the >> server they are connected to >> - one of the servers restarts >> - the 10 clients that were connected to that 1 server re-establish >> connections >> - the new server, assuming it came up in time, has on average 1 >> connection, with each of the other 9 having 1 additional connection >> - now we have 10 servers, one with 1 client and 9 with 11 clients so the >> load is unevenly distributed >> >> What "server" do you mean here? My understanding is that all these 100 >> clients connect to the TCP load balancer. >> >> >> Is there another workaround for this problem other than adding another >> intermediate load balancer? Even then, the load to the load balancers would >> be uneven assuming we'd still need a TCP level LB given we're using >> Kuberentes in GKE. >> >> >> >> We propose fixing this by adding a MaxConnectionLifetime, which will >> force clients to disconnect after some period of time. We'll use the same >> mechanism as when an address is removed from a balancer (e.g. drain the >> connection, rather than abruptly throw errors). >> >> This should be achieved by GRPCLB load balancer which can sense all the >> work load of the servers and send refreshed backend list when needed. I am >> not convinced MaxConnectionLifetime is a must. >> >> >> *Add DialOption NumConnectionsPerSever* >> This is related to the problem above. When a client is provided with a >> single address that points to a TCP load balancer, it's sometimes >> beneficial to have the client have multiple connections since they >> underlying performance might vary. >> >> I am not clear what you plan to do here. Do you want to create multiple >> connections to a single endpoint (e.g., TCP load balancer)? If yes, you can >> customize your load balancer impl to do that already (the endpoints with >> same address but different metadata are treated as different ones in grpc >> internals). >> >> >> Will try this out. Thanks for the suggestion. >> >> >> >> *Add ServerOption MaxConcurrentGlobalStreams* >> Currently there is only a way to limit the number of streams per client, >> but it'd be useful to do this globally. This could be achieved via an >> interceptor that returns StreamRefused, but thought it might be useful in >> grpc. >> >> This is something similar to what we plan to add for flow control >> purpose. gRPC servers will have some knobs (e.g., ServerOption) to throttle >> the resource usage (e.g., memory) of the entire server. >> >> >> Cool, good to hear. >> >> >> *Add facility for retries* >> Currently, retries must happen in user-level code, but it'd be beneficial >> for performance and robustness to do have a way to do this with GRPC. >> Today, if the server refuses a request with StreamRefused, the client >> doesn't have a way to retry on a different server, it can only just issue >> the request and hope it gets a different server. It also forces the client >> to reserialize the request which is unnecessary and given the cost of >> serialization with proto, it'd be nice to avoid this. >> >> >> >> This is also something on our road map. >> >> >> *Change behavior of Dial to not block on the balancer's initial list* >> Currently, when you construct a *grpc.ClientConn with a balancer, the >> call to Dial blocks until the initial set of servers is returned from the >> balancer and errors if the balancer returns an empty list. This is >> inconsistent with the behavior of the client when the balancer produces an >> empty list later in the life of the client. >> >> >> We propose changing the behavior such that Dial does not wait for the >> response of the balancer and thus also can't return an error when the list >> is empty. This not only makes the behavior consistent, it has the added >> benefit that callers don't need to their own retries to Dial. >> >> >> >> If my memory works, this discussion happened before. The name "Dial" >> indicates the dial operation needs to be triggered when it returns. We >> probably can add another public surface like "NewClientConn" to achieve >> what you want here. >> >> >> Ah I see, that's why it waits. That makes sense. NewClientConn would be >> great. >> >> >> >> >> To reiterate, these are just rough ideas and we're also in search of >> other solutions to these problems if you have ideas. >> >> Thanks! >> >> -- >> You received this message because you are subscribed to the Google Groups >> "grpc.io" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to grpc-io+unsubscr...@googlegroups.com. >> To post to this group, send email to grpc-io@googlegroups.com. >> Visit this group at https://groups.google.com/group/grpc-io. >> To view this discussion on the web visit https://groups.google.com/d/ >> msgid/grpc-io/abaa9977-78ee-41d0-b0f5-a4e273dfd13a%40googlegroups.com >> <https://groups.google.com/d/msgid/grpc-io/abaa9977-78ee-41d0-b0f5-a4e273dfd13a%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >> >> >> >> -- >> Thanks, >> -Qi >> >> -- >> You received this message because you are subscribed to the Google Groups >> "grpc.io" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to grpc-io+unsubscr...@googlegroups.com. >> To post to this group, send email to grpc-io@googlegroups.com. >> Visit this group at https://groups.google.com/group/grpc-io. >> To view this discussion on the web visit https://groups.google.com/d/ >> msgid/grpc-io/CAFnDmdoGoqVt%2B_SOmQ5EMmaTpxaF1BFKtCAP0% >> 3DvALCmDofeO4A%40mail.gmail.com >> <https://groups.google.com/d/msgid/grpc-io/CAFnDmdoGoqVt%2B_SOmQ5EMmaTpxaF1BFKtCAP0%3DvALCmDofeO4A%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> >> >> >> >> >> -- >> Thanks, >> -Qi >> > -- Thanks, -Qi -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To post to this group, send email to grpc-io@googlegroups.com. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAFnDmdp5fXe2r5%2BKpUZW80TgapCa-H_fgkwpkcysnBZ%3D73fPcA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
smime.p7s
Description: S/MIME Cryptographic Signature