I am using golang GRPC v1.40.0 on MacOS platforms to communicate with my 
GRPC server. I am noticing a problem where if grpc.Dial is called once when 
the machine's network state is unstable (let's say something happens that 
prevent DNS resolution for my target server's host name) GRPC isn't able to 
recover and return a valid grpc.clientconn object for a very long time, 
even after the network is stabilised (DNS resolution is working again). 
This happens despite several reconnection (grpc.clientconn.close and 
subsequent grpc.dial) attempts.

I don't see any attempt to resolve DNS or any other TCP traffic for my 
server in Wireshark trace during my attempts to re-dial the connection. 
Though I see lot noise from DNS queries to *grpc_config* which gets 
unanswered as my server don't support them. Dial() doesn't return any error 
but connection state is always 'idle' as opposed to 'ready' (when it works).

However, I also noticed that when I pass in a custom dialer function (that 
is using net.dialer with system's default resolver) as part of grpc 
dialOptions during call to grpc.dial(...) in that case the reattempt 
succeeds and I get back a valid connection at ready state. But if don't 
supply a dialer function and let grpc manage the name resolution and 
dialing of the connection then it doesn't recover easily from the bad 
state, takes a very long time, 10-20 mins.

Note: we use DNS scheme for go resolver: resolver.SetDefaultScheme("dns")

Can anyone shed some light into go-grpc's connection establishment logic 
that explains this?

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/grpc-io/7e8bba08-25b7-419f-a67a-59992794bdafn%40googlegroups.com.

Reply via email to