On Tue, May 19, 2020 at 4:27 PM <therealmarkh...@gmail.com> wrote: > > I get DNS lookup timeouts many times a day when running Go HTTP clients on > Kubernetes, where UDP packets are sometimes dropped due to a race condition > in the Linux kernel. I'd like to propose a small standard library change to > help debug these scenarios and get some feedback on the idea and > implementation here before posting it as a Github issue. > > For any http.Client use with a timeout set on the transport, when the timeout > is exceeded and an error is returned it can be challenging to discover the > cause of the timeout from the error message. Is it DNS related or TCP related > or something else? Consider the following simulation of a DNS lookup that > does not return within the timeout. > > package main > > import ( > "context" > "net" > "net/http" > "time" > ) > > var timeout = 50 * time.Millisecond > var host = "http://example.com/" > > // delayedDialer introduces a delay when performing DNS lookups. > func delayedDialer(ctx context.Context, network, address string) (net.Conn, > error) { > time.Sleep(2 * timeout) > d := net.Dialer{} > return d.DialContext(ctx, network, address) > } > > func main() { > r := &net.Resolver{ > PreferGo: true, > Dial: delayedDialer, > } > tr := &http.Transport{ > DialContext: (&net.Dialer{ > Timeout: timeout, > Resolver: r, > }).DialContext, > } > client := &http.Client{Transport: tr} > _, err := client.Get(host) > if err != nil { > panic(err) > } > } > > In the event of a DNS lookup timeout, the output is: > > panic: Get "http://example.com/": dial tcp: i/o timeout > > In the event of a TCP connection timeout (comment out the time.Sleep() call) > the output is: > > panic: Get "http://example.com/": dial tcp 93.184.216.34:80: i/o timeout > > The missing IP address in the DNS lookup timeout output is the only clue that > the root cause is a DNS problem, rather than a TCP problem. If users do not > expect to see an IP address in that error message, they won't know it is > missing, and their debugging and testing effort may be wasted investigating > possible TCP-related causes, instead of investigating DNS (which normally > runs over UDP). > > The error message could communicate the root cause to users more clearly. > After a DNS lookup timeout, the error returned by client.Get() looks like > this in Go 1.14.3: > > &url.Error{ > Op: "Get", > URL: "http://example.com/", > Err: &net.OpError{ > Op: "dial", > Net: "tcp", > Source: nil, > Addr: nil, > Err: &poll.TimeoutError{}, > }, > } > > Note: The &poll.TimeoutError{} will become &net.timeoutError{} in future > releases after commit d422f54. > > The resolver in net/lookup.go is the source of the *poll.TimeoutError. > Instead it could return a *net.DNSError error when a lookup times out. Doing > so would provide additional error context for all use cases (not only > http.Client) and may save debugging time. For example client.Get() could > return: > > &url.Error{ > Op: "Get", > URL: "http://example.com/", > Err: &net.OpError{ > Op: "dial", > Net: "tcp", > Source: nil, > Addr: nil, > Err: &net.DNSError{ > Err: "i/o timeout", > Name: "example.com", > Server: "", > IsTimeout: true, > IsTemporary: false, > IsNotFound: false, > }, > }, > } > > This would preserve the existing "i/o timeout" string and the ability to call > err.Timeout(), while adding "lookup example.com: " to the Error() output and > the capability to explicitly check for a *net.DNSError. The output from my > simulator code would also be more explicit: > > panic: Get "http://example.com/": dial tcp: lookup example.com: i/o timeout > > The following change to implement this works in my usage and passes the tests > in all.bash. > > $ git diff > diff --git a/src/net/lookup.go b/src/net/lookup.go > index 5f7119872a..b1a22dd4e6 100644 > --- a/src/net/lookup.go > +++ b/src/net/lookup.go > @@ -314,6 +314,9 @@ func (r *Resolver) lookupIPAddr(ctx context.Context, > network, host string) ([]IP > }() > } > err := mapErr(ctx.Err()) > + if t, ok := err.(timeout); ok && t.Timeout() { > + err = &DNSError{Err: err.Error(), Name: host, > IsTimeout: true} > + } > if trace != nil && trace.DNSDone != nil { > trace.DNSDone(nil, false, err) > } > > I can submit a Github issue and pull request if anyone thinks that this is a > good idea and implementation.
I don't know if that is the right implementation, but what you say sounds plausible, and I encourage you to open an issue. Thanks. Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcUj3xVfZwutjfMgfQ9FdBr-HHtQZ-RCyTWeKxM4gXnisA%40mail.gmail.com.