Hi,

I'm encountering an intermittent issue with a gRPC-based client-server
application running in AWS ECS. Here's the setup:
* A simple gRPC server runs as an ECS task behind an Elastic Load Balancer
(ELB).
* The gRPC client is also containerized and runs as an ECS task in the same
cluster.

Under normal conditions, RPC calls complete quickly—typically within 2–3
ms, with a maximum of around 4 ms. However, I occasionally see
DEADLINE_EXCEEDED errors in the client.

After enabling trace logging in the gRPC library, I noticed these errors
consistently occur when the ELB's IP address changes. It seems that the
gRPC client continues attempting to connect to the outdated IP address,
ultimately resulting in the deadline errors.

Currently, the client is using the default pick_first load balancing
policy. From the documentation gRPC load balancing strategies (
https://grpc.github.io/grpc/cpp/md_doc_load-balancing.html), it seems that
switching to round_robin might better handle scenarios where the server IP
changes after the client has already established a connection.

I have a few questions around this issue:
1. Would switching to round_robin mitigate this issue by prompting the
client to cycle through updated IPs?
2. Are there performance or stability trade-offs when using round_robin
instead of pick_first?
3. If round_robin is generally more resilient in these dynamic
environments, why isn't it the default policy?

Thanks for your time.

-mandeep

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/grpc-io/CAC%2BQLdQR7TVS6EpCW%2B%2BYNkaB5FxRnRKgTNbx-NjsqdH%3DbU6a%3Dg%40mail.gmail.com.

Reply via email to