Hi gRPC gurus, I was recently profiling one of my golang servers that uses bi-directional streaming gRPC both upstream and downstream, and trying to debug some tail latency issues. Looking at mutex profile, I see that a lot of time is spent on lock contention in http2Client.handleData, specifically, here are the snippet of the profile and 2 lines caught my attention: line 950 and line 971 (highlighted below).
>From a quick look at the code it looks like line 971 is not avoidable. However 950 can probably be disabled if I don't use dynamic window size. I'm not sure whether I should be doing that because it might have unforeseen side effect but I wonder if anyone could help me to see if this looks like a reasonable profile and what I can do to reduce the overall contention (right now line 950 accounts for 64% of all delays/contentions for the entire server). Also I'd really like to learn what's the pros and cons for a non-default initial window size. Thanks, Mengyao Type: delay Time: Jun 15, 2020 at 12:37am (EDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list handleData Total: 1.39hrs ROUTINE ======================== google.golang.org/grpc/internal/transport.(*http2Client).handleData in /Users/****/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go 0 1.12hrs (flat, cum) 80.61% of Total . . 939: if sendBDPPing { . . 940: // Avoid excessive ping detection (e.g. in an L7 proxy) . . 941: // by sending a window update prior to the BDP ping. . . 942: . . 943: if w := t.fc.reset(); w > 0 { . 5.92mins 944: t.controlBuf.put(&outgoingWindowUpdate{ . . 945: streamID: 0, . . 946: increment: w, . . 947: }) . . 948: } . . 949: . 35.34mins 950: t.controlBuf.put(bdpPing) . . 951: } . . 952: // Select the right stream to dispatch. . 10.15ms 953: s := t.getStream(f) . . 954: if s == nil { . . 955: return . . 956: } . . 957: if size > 0 { . 154.30ms 958: if err := s.fc.onData(size); err != nil { . . 959: t.closeStream(s, io.EOF, true, http2.ErrCodeFlowControl, status.New(codes.Internal, err.Error()), nil, false) . . 960: return . . 961: } . . 962: if f.Header().Flags.Has(http2.FlagDataPadded) { . . 963: if w := s.fc.onRead(size - uint32(len(f.Data()))); w > 0 { . . 964: t.controlBuf.put(&outgoingWindowUpdate{s.id, w}) . . 965: } . . 966: } . . 967: // TODO(bradfitz, zhaoq): A copy is required here because there is no . . 968: // guarantee f.Data() is consumed before the arrival of next frame. . . 969: // Can this copy be eliminated? . . 970: if len(f.Data()) > 0 { . 24.64mins 971: buffer := t.bufferPool.get() . . 972: buffer.Reset() . . 973: buffer.Write(f.Data()) . 1.48mins 974: s.write(recvMsg{buffer: buffer}) . . 975: } . . 976: } -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/5e347bbe-373a-48c8-9893-d4e7566f9f01o%40googlegroups.com.
