Hi Yonatan,

In gRPC Python side, the consumption of message is sequential, and won't be
kept in memory.
If you recall the batch operations, only if a message is sent to
application, will gRPC Python start another RECV_MESSAGE operation.
It's unlikely that the problem resided in Python space.

In C-Core space, AFAIK for each TCP read, the size is 4MiB
<https://github.com/grpc/grpc/blob/master/src/core/lib/iomgr/tcp_posix.cc#L1177>
per
channel.
I think we have flow control both in TCP level and HTTP2 level.

For debugging, did you try to use ASAN? For channel arg, I can only find
"GRPC_ARG_TCP_READ_CHUNK_SIZE" and "GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH"
that might be related to your case.

Lidi Zheng

On Fri, Jul 19, 2019 at 12:48 PM Yonatan Zunger <zun...@humu.com> wrote:

> Maybe a more concrete way of asking this question: Let's say we have a
> Python gRPC client making a response-streaming request to some gRPC server.
> The server starts to stream back responses. If the client fails to consume
> data as fast as the server generates it, I'm trying to figure out where the
> data would accumulate, and which memory allocator it would be using.
> (Because Python heap profiling won't see calls to malloc())
>
> If I'm understanding correctly:
>
> * The responses are written by the server to the network socket at the
> server's own speed (no pushback controlling it);
> * These get picked up by the kernel network device on the client, and get
> pulled into userspace ASAP by the event loop, which is in the C layer of
> the gRPC client. This is stored in a grpc_byte_buffer and builds up there.
> * The Python client library exposes a response iterator, which is
> ultimately a _Rendezvous object; its iteration is implemented in
> _Rendezvous._next(), which calls cygrpc.ReceiveMessageOperation, which is
> what drains data from the grpc_byte_buffer and passes it to the protobuf
> parser, which creates objects in the Python memory address space and
> returns them to the caller.
>
> This means that if the client were to drain the iterator more slowly, data
> would accumulate in the grpc_byte_buffer, which is in the C layer and not
> visible to (e.g.) Python heap profiling using the PEP445 malloc hooks.
>
> If I am understanding this correctly, is there any way (without doing a
> massive amount of plumbing) to monitor the state of the byte buffer, e.g.
> with some gRPC debug parameter? And is there any mechanism in the C layer
> which limits the size of this buffer, doing something like failing the RPC
> if the buffer size exceeds some threshold?
>
> Yonatan
>
> On Thu, Jul 18, 2019 at 5:27 PM Yonatan Zunger <zun...@humu.com> wrote:
>
>> Hi everyone,
>>
>> I'm trying to debug a mysterious memory blowout in a Python batch job,
>> and one of the angles I'm exploring is that this may have to do with the
>> way it's reading data. This job is reading from bigtable, which is
>> ultimately fetching the actual data with a unidirectional streaming "read
>> rows" RPC. This takes a single request and returns a sequence of data
>> chunks, the higher-level client reshapes this into an iterator over the
>> individual data cells, and those are consumed by the higher-level program,
>> so that the next response proto is consumed once the program is ready to
>> parse it.
>>
>> Something I can't remember about gRPC internals: What, if anything, is
>> the pushback mechanism in unidirectional streaming? In the zero-pushback
>> case, it would seem that a server could yield results at any speed, which
>> would be accepted by the client and stored in gRPC's internal buffers until
>> it got read by the client code, which could potentially cause a large
>> memory blowout if the server wrote faster than the client read. Is this in
>> fact the case? If so, is there any good way to instrument and detect if
>> it's happening? (Some combination of gRPC debug flags, perhaps) If not, is
>> there some pushback mechanism I'm not thinking of?
>>
>> (Alas, I can't change the protocol in this situation; the server is run
>> by someone else)
>>
>> Yonatan
>>
> --
> You received this message because you are subscribed to the Google Groups "
> grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to grpc-io+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/grpc-io/CAFk%3DnbT16yfxQ_%2BUkudCAkaADECw-XRbqvtC4u%3DbaEQ_Rv9VAA%40mail.gmail.com
> <https://groups.google.com/d/msgid/grpc-io/CAFk%3DnbT16yfxQ_%2BUkudCAkaADECw-XRbqvtC4u%3DbaEQ_Rv9VAA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAMC1%3DjdsOi9prtxQSrphufdbXd%2BkKwzfMw9b%2BMLH1OfsbwKFEg%40mail.gmail.com.

Reply via email to