Re: [grpc-io] Potential contributions to golang grpc implementation

'Qi Zhao' via grpc.io Mon, 05 Dec 2016 15:00:12 -0800

I am still confused. The scenario you want to serve as an example is
a) grpc clients --> TCP LB --> grpc servers;
or
b) grpc clients --> grpc servers?


On Mon, Dec 5, 2016 at 2:49 PM, Arya Asemanfar <arya.aseman...@mixpanel.com>
wrote:

> Sorry, I meant grpc server. Yes you are right if the TCP load balancer
> restarts there is no problem, so my scenario only applies if the grpc
> server restarts.
>
> On Mon, Dec 5, 2016 at 2:17 PM Qi Zhao <zh...@google.com> wrote:
>
>> On Mon, Dec 5, 2016 at 12:13 PM, Arya Asemanfar <
>> arya.aseman...@mixpanel.com> wrote:
>>
>> Thanks for the feedback. Good idea re metadata for getting the Balancer
>> to treat the connections as different. Will take a look at that.
>>
>> Some clarifications/questions inline:
>>
>> On Mon, Dec 5, 2016 at 11:11 AM, 'Qi Zhao' via grpc.io <
>> grpc-io@googlegroups.com> wrote:
>>
>> Thanks for the info. My comments are inline.
>>
>> On Sun, Dec 4, 2016 at 7:27 PM, Arya Asemanfar <
>> arya.aseman...@mixpanel.com> wrote:
>>
>> Hey all,
>>
>> We're considering implementing some patches to the golang grpc
>> implementation. These are things we think would better fit inside of grpc
>> rather than trying to achieve from outside. Before we go through the
>> effort, we'd like to gauge whether these features would be welcome
>> (assuming we'll work with owners to get a quality implementation). Some of
>> these ideas are not fully fleshed out or may not be the best solution to
>> the problem they aim to solve. I also try to state the problem, so if you
>> have ideas on better ways to address these problems, please share :)
>>
>> *Add DialOption MaxConnectionLifetime*
>> Currently, once a connection is established, it lives until there is a
>> transport error or the client proactively closes the connection. These
>> long-lived connections are problematic when using a TCP load balancer, such
>> as the one provided by Google Container Engine and Google Compute Engine.
>> At a a clean start, clients will be somewhat distributed among the servers
>> behind the load balancer, but if the servers go through a rolling restart
>> server will become unbalanced as clients will have a higher likelihood of
>> being connected to the first server that restarts, with the most recently
>> restarted server having close to zero clients.
>>
>> I do not think long-lived connections are problematic as long as there
>> are live traffic on them. We do have plan to add idle shutdown to actively
>> close the TCP connections which live long and have no traffic for a while.
>> Which server to chose is really depending on the load balancing policy you
>> choose -- I do not see why your description could happen if you use a
>> round-robin load balance policy.
>>
>>
>> We have a single IP address that we give to GRPC (since the IP address is
>> Google Cloud's TCP load balancer). The client establishes one connection
>> and has no reason to disconnect in normal conditions.
>>
>> Here's an example scenario that results in uneven load:
>> - 100 clients connected evenly to 10 servers
>> - each of the 10 servers has about 10 connect
>> - each of the clients sends about an equal amount of traffic to the
>> server they are connected to
>> - one of the servers restarts
>> - the 10 clients that were connected to that 1 server re-establish
>> connections
>> - the new server, assuming it came up in time, has on average 1
>> connection, with each of the other 9 having 1 additional connection
>> - now we have 10 servers, one with 1 client and 9 with 11 clients so the
>> load is unevenly distributed
>>
>> What "server" do you mean here? My understanding is that all these 100
>> clients connect to the TCP load balancer.
>>
>>
>> Is there another workaround for this problem other than adding another
>> intermediate load balancer? Even then, the load to the load balancers would
>> be uneven assuming we'd still need a TCP level LB given we're using
>> Kuberentes in GKE.
>>
>>
>>
>> We propose fixing this by adding a MaxConnectionLifetime, which will
>> force clients to disconnect after some period of time. We'll use the same
>> mechanism as when an address is removed from a balancer (e.g. drain the
>> connection, rather than abruptly throw errors).
>>
>> This should be achieved by GRPCLB load balancer which can sense all the
>> work load of the servers and send refreshed backend list when needed. I am
>> not convinced MaxConnectionLifetime is a must.
>>
>>
>> *Add DialOption NumConnectionsPerSever*
>> This is related to the problem above. When a client is provided with a
>> single address that points to a TCP load balancer, it's sometimes
>> beneficial to have the client have multiple connections since they
>> underlying performance might vary.
>>
>> I am not clear what you plan to do here. Do you want to create multiple
>> connections to a single endpoint (e.g., TCP load balancer)? If yes, you can
>> customize your load balancer impl to do that already (the endpoints with
>> same address but different metadata are treated as different ones in grpc
>> internals).
>>
>>
>> Will try this out. Thanks for the suggestion.
>>
>>
>>
>> *Add ServerOption MaxConcurrentGlobalStreams*
>> Currently there is only a way to limit the number of streams per client,
>> but it'd be useful to do this globally. This could be achieved via an
>> interceptor that returns StreamRefused, but thought it might be useful in
>> grpc.
>>
>> This is something similar to what we plan to add for flow control
>> purpose. gRPC servers will have some knobs (e.g., ServerOption) to throttle
>> the resource usage (e.g., memory) of the entire server.
>>
>>
>> Cool, good to hear.
>>
>>
>> *Add facility for retries*
>> Currently, retries must happen in user-level code, but it'd be beneficial
>> for performance and robustness to do have a way to do this with GRPC.
>> Today, if the server refuses a request with StreamRefused, the client
>> doesn't have a way to retry on a different server, it can only just issue
>> the request and hope it gets a different server. It also forces the client
>> to reserialize the request which is unnecessary and given the cost of
>> serialization with proto, it'd be nice to avoid this.
>>
>>
>>
>> This is also something on our road map.
>>
>>
>> *Change behavior of Dial to not block on the balancer's initial list*
>> Currently, when you construct a *grpc.ClientConn with a balancer, the
>> call to Dial blocks until the initial set of servers is returned from the
>> balancer and errors if the balancer returns an empty list. This is
>> inconsistent with the behavior of the client when the balancer produces an
>> empty list later in the life of the client.
>>
>>
>> We propose changing the behavior such that Dial does not wait for the
>> response of the balancer and thus also can't return an error when the list
>> is empty. This not only makes the behavior consistent, it has the added
>> benefit that callers don't need to their own retries to Dial.
>>
>>
>>
>> If my memory works, this discussion happened before. The name "Dial"
>> indicates the dial operation needs to be triggered when it returns. We
>> probably can add another public surface like "NewClientConn" to achieve
>> what you want here.
>>
>>
>> Ah I see, that's why it waits. That makes sense. NewClientConn would be
>> great.
>>
>>
>>
>>
>> To reiterate, these are just rough ideas and we're also in search of
>> other solutions to these problems if you have ideas.
>>
>> Thanks!
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "grpc.io" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to grpc-io+unsubscr...@googlegroups.com.
>> To post to this group, send email to grpc-io@googlegroups.com.
>> Visit this group at https://groups.google.com/group/grpc-io.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/grpc-io/abaa9977-78ee-41d0-b0f5-a4e273dfd13a%40googlegroups.com
>> <https://groups.google.com/d/msgid/grpc-io/abaa9977-78ee-41d0-b0f5-a4e273dfd13a%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>>
>> --
>> Thanks,
>> -Qi
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "grpc.io" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to grpc-io+unsubscr...@googlegroups.com.
>> To post to this group, send email to grpc-io@googlegroups.com.
>> Visit this group at https://groups.google.com/group/grpc-io.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/grpc-io/CAFnDmdoGoqVt%2B_SOmQ5EMmaTpxaF1BFKtCAP0%
>> 3DvALCmDofeO4A%40mail.gmail.com
>> <https://groups.google.com/d/msgid/grpc-io/CAFnDmdoGoqVt%2B_SOmQ5EMmaTpxaF1BFKtCAP0%3DvALCmDofeO4A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>>
>>
>> --
>> Thanks,
>> -Qi
>>
>


-- 
Thanks,
-Qi

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAFnDmdp5fXe2r5%2BKpUZW80TgapCa-H_fgkwpkcysnBZ%3D73fPcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [grpc-io] Potential contributions to golang grpc implementation

Reply via email to