Not sure. It's not a memory leak, alas -- it's a memory spike, which is
OOMing my code.

After reading through the C layer some, I decided to try something simple
and stupid: I'm doing a fast force-drain of the RPC iterator to pull the
result into the Python address space, where I can monitor and instrument it
more easily, and then hopefully figure out if this is what's causing the
memory spike.

Thank you!

On Fri, Jul 19, 2019 at 2:09 PM Lidi Zheng <li...@google.com> wrote:

> Internally, we are running ASAN test on Python tests.
>
> If you are using Bazel, it would be as simple as --config=ASAN.
> If not... then it can be challenging indeed.
>
> The test is about 'GRPC_ARG_PER_RPC_RETRY_BUFFER_SIZE' which seems not
> directly related to your case.
> If the buffer is not consumed in chttp2 parser, I don't think it will do
> another round of tcp_read for that channel.
>
> Can this memory leak be observed in simpler cases?
>
>
>
> On Fri, Jul 19, 2019 at 1:03 PM Yonatan Zunger <zun...@humu.com> wrote:
>
>> I have no idea what would be involved in attaching ASAN to Python, and
>> suspect it may be "exciting," so I'm trying to see first if gRPC has any
>> monitoring capability around its buffers.
>>
>> One thing I did notice while reading through the codebase was unittests
>> like this one
>> <https://github.com/grpc/grpc/blob/master/test/core/end2end/tests/retry_exceeds_buffer_size_in_subsequent_batch.cc>
>>  about
>> exceeding buffer sizes -- that does seem to trigger an ABORTED response,
>> but the test was fairly hard to understand (not much commenting there...).
>> Am I right in thinking that if this 4MB buffer is overflowed, that's
>> somehow going to happen?
>>
>> On Fri, Jul 19, 2019 at 12:59 PM Lidi Zheng <li...@google.com> wrote:
>>
>>> Hi Yonatan,
>>>
>>> In gRPC Python side, the consumption of message is sequential, and won't
>>> be kept in memory.
>>> If you recall the batch operations, only if a message is sent to
>>> application, will gRPC Python start another RECV_MESSAGE operation.
>>> It's unlikely that the problem resided in Python space.
>>>
>>> In C-Core space, AFAIK for each TCP read, the size is 4MiB
>>> <https://github.com/grpc/grpc/blob/master/src/core/lib/iomgr/tcp_posix.cc#L1177>
>>>  per
>>> channel.
>>> I think we have flow control both in TCP level and HTTP2 level.
>>>
>>> For debugging, did you try to use ASAN? For channel arg, I can only find
>>> "GRPC_ARG_TCP_READ_CHUNK_SIZE" and "GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH"
>>> that might be related to your case.
>>>
>>> Lidi Zheng
>>>
>>> On Fri, Jul 19, 2019 at 12:48 PM Yonatan Zunger <zun...@humu.com> wrote:
>>>
>>>> Maybe a more concrete way of asking this question: Let's say we have a
>>>> Python gRPC client making a response-streaming request to some gRPC server.
>>>> The server starts to stream back responses. If the client fails to consume
>>>> data as fast as the server generates it, I'm trying to figure out where the
>>>> data would accumulate, and which memory allocator it would be using.
>>>> (Because Python heap profiling won't see calls to malloc())
>>>>
>>>> If I'm understanding correctly:
>>>>
>>>> * The responses are written by the server to the network socket at the
>>>> server's own speed (no pushback controlling it);
>>>> * These get picked up by the kernel network device on the client, and
>>>> get pulled into userspace ASAP by the event loop, which is in the C layer
>>>> of the gRPC client. This is stored in a grpc_byte_buffer and builds up
>>>> there.
>>>> * The Python client library exposes a response iterator, which is
>>>> ultimately a _Rendezvous object; its iteration is implemented in
>>>> _Rendezvous._next(), which calls cygrpc.ReceiveMessageOperation, which is
>>>> what drains data from the grpc_byte_buffer and passes it to the protobuf
>>>> parser, which creates objects in the Python memory address space and
>>>> returns them to the caller.
>>>>
>>>> This means that if the client were to drain the iterator more slowly,
>>>> data would accumulate in the grpc_byte_buffer, which is in the C layer and
>>>> not visible to (e.g.) Python heap profiling using the PEP445 malloc hooks.
>>>>
>>>> If I am understanding this correctly, is there any way (without doing a
>>>> massive amount of plumbing) to monitor the state of the byte buffer, e.g.
>>>> with some gRPC debug parameter? And is there any mechanism in the C layer
>>>> which limits the size of this buffer, doing something like failing the RPC
>>>> if the buffer size exceeds some threshold?
>>>>
>>>> Yonatan
>>>>
>>>> On Thu, Jul 18, 2019 at 5:27 PM Yonatan Zunger <zun...@humu.com> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I'm trying to debug a mysterious memory blowout in a Python batch job,
>>>>> and one of the angles I'm exploring is that this may have to do with the
>>>>> way it's reading data. This job is reading from bigtable, which is
>>>>> ultimately fetching the actual data with a unidirectional streaming "read
>>>>> rows" RPC. This takes a single request and returns a sequence of data
>>>>> chunks, the higher-level client reshapes this into an iterator over the
>>>>> individual data cells, and those are consumed by the higher-level program,
>>>>> so that the next response proto is consumed once the program is ready to
>>>>> parse it.
>>>>>
>>>>> Something I can't remember about gRPC internals: What, if anything, is
>>>>> the pushback mechanism in unidirectional streaming? In the zero-pushback
>>>>> case, it would seem that a server could yield results at any speed, which
>>>>> would be accepted by the client and stored in gRPC's internal buffers 
>>>>> until
>>>>> it got read by the client code, which could potentially cause a large
>>>>> memory blowout if the server wrote faster than the client read. Is this in
>>>>> fact the case? If so, is there any good way to instrument and detect if
>>>>> it's happening? (Some combination of gRPC debug flags, perhaps) If not, is
>>>>> there some pushback mechanism I'm not thinking of?
>>>>>
>>>>> (Alas, I can't change the protocol in this situation; the server is
>>>>> run by someone else)
>>>>>
>>>>> Yonatan
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "grpc.io" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to grpc-io+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/grpc-io/CAFk%3DnbT16yfxQ_%2BUkudCAkaADECw-XRbqvtC4u%3DbaEQ_Rv9VAA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/grpc-io/CAFk%3DnbT16yfxQ_%2BUkudCAkaADECw-XRbqvtC4u%3DbaEQ_Rv9VAA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAFk%3DnbRay0cjn5x7UMtGe51vdc5jSOt%2Bu6ff81vOzwjNFTe8QQ%40mail.gmail.com.

Reply via email to