On Tue, May 19, 2020 at 4:27 PM <therealmarkh...@gmail.com> wrote:
>
> I get DNS lookup timeouts many times a day when running Go HTTP clients on 
> Kubernetes, where UDP packets are sometimes dropped due to a race condition 
> in the Linux kernel. I'd like to propose a small standard library change to 
> help debug these scenarios and get some feedback on the idea and 
> implementation here before posting it as a Github issue.
>
> For any http.Client use with a timeout set on the transport, when the timeout 
> is exceeded and an error is returned it can be challenging to discover the 
> cause of the timeout from the error message. Is it DNS related or TCP related 
> or something else? Consider the following simulation of a DNS lookup that 
> does not return within the timeout.
>
> package main
>
> import (
>     "context"
>     "net"
>     "net/http"
>     "time"
> )
>
> var timeout = 50 * time.Millisecond
> var host = "http://example.com/";
>
> // delayedDialer introduces a delay when performing DNS lookups.
> func delayedDialer(ctx context.Context, network, address string) (net.Conn, 
> error) {
>     time.Sleep(2 * timeout)
>     d := net.Dialer{}
>     return d.DialContext(ctx, network, address)
> }
>
> func main() {
>     r := &net.Resolver{
>         PreferGo: true,
>         Dial:     delayedDialer,
>     }
>     tr := &http.Transport{
>         DialContext: (&net.Dialer{
>             Timeout:  timeout,
>             Resolver: r,
>         }).DialContext,
>     }
>     client := &http.Client{Transport: tr}
>     _, err := client.Get(host)
>     if err != nil {
>         panic(err)
>     }
> }
>
> In the event of a DNS lookup timeout, the output is:
>
> panic: Get "http://example.com/": dial tcp: i/o timeout
>
> In the event of a TCP connection timeout (comment out the time.Sleep() call) 
> the output is:
>
> panic: Get "http://example.com/": dial tcp 93.184.216.34:80: i/o timeout
>
> The missing IP address in the DNS lookup timeout output is the only clue that 
> the root cause is a DNS problem, rather than a TCP problem. If users do not 
> expect to see an IP address in that error message, they won't know it is 
> missing, and their debugging and testing effort may be wasted investigating 
> possible TCP-related causes, instead of investigating DNS (which normally 
> runs over UDP).
>
> The error message could communicate the root cause to users more clearly. 
> After a DNS lookup timeout, the error returned by client.Get() looks like 
> this in Go 1.14.3:
>
> &url.Error{
>   Op:  "Get",
>   URL: "http://example.com/";,
>   Err: &net.OpError{
>     Op:     "dial",
>     Net:    "tcp",
>     Source: nil,
>     Addr:   nil,
>     Err:    &poll.TimeoutError{},
>   },
> }
>
> Note: The &poll.TimeoutError{} will become &net.timeoutError{} in future 
> releases after commit d422f54.
>
> The resolver in net/lookup.go is the source of the *poll.TimeoutError. 
> Instead it could return a *net.DNSError error when a lookup times out. Doing 
> so would provide additional error context for all use cases (not only 
> http.Client) and may save debugging time. For example client.Get() could 
> return:
>
> &url.Error{
>   Op:  "Get",
>   URL: "http://example.com/";,
>   Err: &net.OpError{
>     Op:     "dial",
>     Net:    "tcp",
>     Source: nil,
>     Addr:   nil,
>     Err:    &net.DNSError{
>       Err:         "i/o timeout",
>       Name:        "example.com",
>       Server:      "",
>       IsTimeout:   true,
>       IsTemporary: false,
>       IsNotFound:  false,
>     },
>   },
> }
>
> This would preserve the existing "i/o timeout" string and the ability to call 
> err.Timeout(), while adding "lookup example.com: " to the Error() output and 
> the capability to explicitly check for a *net.DNSError. The output from my 
> simulator code would also be more explicit:
>
> panic: Get "http://example.com/": dial tcp: lookup example.com: i/o timeout
>
> The following change to implement this works in my usage and passes the tests 
> in all.bash.
>
> $ git diff
> diff --git a/src/net/lookup.go b/src/net/lookup.go
> index 5f7119872a..b1a22dd4e6 100644
> --- a/src/net/lookup.go
> +++ b/src/net/lookup.go
> @@ -314,6 +314,9 @@ func (r *Resolver) lookupIPAddr(ctx context.Context, 
> network, host string) ([]IP
>                         }()
>                 }
>                 err := mapErr(ctx.Err())
> +               if t, ok := err.(timeout); ok && t.Timeout() {
> +                       err = &DNSError{Err: err.Error(), Name: host, 
> IsTimeout: true}
> +               }
>                 if trace != nil && trace.DNSDone != nil {
>                         trace.DNSDone(nil, false, err)
>                 }
>
> I can submit a Github issue and pull request if anyone thinks that this is a 
> good idea and implementation.

I don't know if that is the right implementation, but what you say
sounds plausible, and I encourage you to open an issue.  Thanks.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcUj3xVfZwutjfMgfQ9FdBr-HHtQZ-RCyTWeKxM4gXnisA%40mail.gmail.com.

Reply via email to