Not sure. It's not a memory leak, alas -- it's a memory spike, which is OOMing my code.
After reading through the C layer some, I decided to try something simple and stupid: I'm doing a fast force-drain of the RPC iterator to pull the result into the Python address space, where I can monitor and instrument it more easily, and then hopefully figure out if this is what's causing the memory spike. Thank you! On Fri, Jul 19, 2019 at 2:09 PM Lidi Zheng <li...@google.com> wrote: > Internally, we are running ASAN test on Python tests. > > If you are using Bazel, it would be as simple as --config=ASAN. > If not... then it can be challenging indeed. > > The test is about 'GRPC_ARG_PER_RPC_RETRY_BUFFER_SIZE' which seems not > directly related to your case. > If the buffer is not consumed in chttp2 parser, I don't think it will do > another round of tcp_read for that channel. > > Can this memory leak be observed in simpler cases? > > > > On Fri, Jul 19, 2019 at 1:03 PM Yonatan Zunger <zun...@humu.com> wrote: > >> I have no idea what would be involved in attaching ASAN to Python, and >> suspect it may be "exciting," so I'm trying to see first if gRPC has any >> monitoring capability around its buffers. >> >> One thing I did notice while reading through the codebase was unittests >> like this one >> <https://github.com/grpc/grpc/blob/master/test/core/end2end/tests/retry_exceeds_buffer_size_in_subsequent_batch.cc> >> about >> exceeding buffer sizes -- that does seem to trigger an ABORTED response, >> but the test was fairly hard to understand (not much commenting there...). >> Am I right in thinking that if this 4MB buffer is overflowed, that's >> somehow going to happen? >> >> On Fri, Jul 19, 2019 at 12:59 PM Lidi Zheng <li...@google.com> wrote: >> >>> Hi Yonatan, >>> >>> In gRPC Python side, the consumption of message is sequential, and won't >>> be kept in memory. >>> If you recall the batch operations, only if a message is sent to >>> application, will gRPC Python start another RECV_MESSAGE operation. >>> It's unlikely that the problem resided in Python space. >>> >>> In C-Core space, AFAIK for each TCP read, the size is 4MiB >>> <https://github.com/grpc/grpc/blob/master/src/core/lib/iomgr/tcp_posix.cc#L1177> >>> per >>> channel. >>> I think we have flow control both in TCP level and HTTP2 level. >>> >>> For debugging, did you try to use ASAN? For channel arg, I can only find >>> "GRPC_ARG_TCP_READ_CHUNK_SIZE" and "GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH" >>> that might be related to your case. >>> >>> Lidi Zheng >>> >>> On Fri, Jul 19, 2019 at 12:48 PM Yonatan Zunger <zun...@humu.com> wrote: >>> >>>> Maybe a more concrete way of asking this question: Let's say we have a >>>> Python gRPC client making a response-streaming request to some gRPC server. >>>> The server starts to stream back responses. If the client fails to consume >>>> data as fast as the server generates it, I'm trying to figure out where the >>>> data would accumulate, and which memory allocator it would be using. >>>> (Because Python heap profiling won't see calls to malloc()) >>>> >>>> If I'm understanding correctly: >>>> >>>> * The responses are written by the server to the network socket at the >>>> server's own speed (no pushback controlling it); >>>> * These get picked up by the kernel network device on the client, and >>>> get pulled into userspace ASAP by the event loop, which is in the C layer >>>> of the gRPC client. This is stored in a grpc_byte_buffer and builds up >>>> there. >>>> * The Python client library exposes a response iterator, which is >>>> ultimately a _Rendezvous object; its iteration is implemented in >>>> _Rendezvous._next(), which calls cygrpc.ReceiveMessageOperation, which is >>>> what drains data from the grpc_byte_buffer and passes it to the protobuf >>>> parser, which creates objects in the Python memory address space and >>>> returns them to the caller. >>>> >>>> This means that if the client were to drain the iterator more slowly, >>>> data would accumulate in the grpc_byte_buffer, which is in the C layer and >>>> not visible to (e.g.) Python heap profiling using the PEP445 malloc hooks. >>>> >>>> If I am understanding this correctly, is there any way (without doing a >>>> massive amount of plumbing) to monitor the state of the byte buffer, e.g. >>>> with some gRPC debug parameter? And is there any mechanism in the C layer >>>> which limits the size of this buffer, doing something like failing the RPC >>>> if the buffer size exceeds some threshold? >>>> >>>> Yonatan >>>> >>>> On Thu, Jul 18, 2019 at 5:27 PM Yonatan Zunger <zun...@humu.com> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> I'm trying to debug a mysterious memory blowout in a Python batch job, >>>>> and one of the angles I'm exploring is that this may have to do with the >>>>> way it's reading data. This job is reading from bigtable, which is >>>>> ultimately fetching the actual data with a unidirectional streaming "read >>>>> rows" RPC. This takes a single request and returns a sequence of data >>>>> chunks, the higher-level client reshapes this into an iterator over the >>>>> individual data cells, and those are consumed by the higher-level program, >>>>> so that the next response proto is consumed once the program is ready to >>>>> parse it. >>>>> >>>>> Something I can't remember about gRPC internals: What, if anything, is >>>>> the pushback mechanism in unidirectional streaming? In the zero-pushback >>>>> case, it would seem that a server could yield results at any speed, which >>>>> would be accepted by the client and stored in gRPC's internal buffers >>>>> until >>>>> it got read by the client code, which could potentially cause a large >>>>> memory blowout if the server wrote faster than the client read. Is this in >>>>> fact the case? If so, is there any good way to instrument and detect if >>>>> it's happening? (Some combination of gRPC debug flags, perhaps) If not, is >>>>> there some pushback mechanism I'm not thinking of? >>>>> >>>>> (Alas, I can't change the protocol in this situation; the server is >>>>> run by someone else) >>>>> >>>>> Yonatan >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "grpc.io" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to grpc-io+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/grpc-io/CAFk%3DnbT16yfxQ_%2BUkudCAkaADECw-XRbqvtC4u%3DbaEQ_Rv9VAA%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/grpc-io/CAFk%3DnbT16yfxQ_%2BUkudCAkaADECw-XRbqvtC4u%3DbaEQ_Rv9VAA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAFk%3DnbRay0cjn5x7UMtGe51vdc5jSOt%2Bu6ff81vOzwjNFTe8QQ%40mail.gmail.com.