Re: [grpc-io] Re: Pushback in unidirectional streaming RPC's

2019-07-19 Thread Yonatan Zunger
Not sure. It's not a memory leak, alas -- it's a memory spike, which is
OOMing my code.

After reading through the C layer some, I decided to try something simple
and stupid: I'm doing a fast force-drain of the RPC iterator to pull the
result into the Python address space, where I can monitor and instrument it
more easily, and then hopefully figure out if this is what's causing the
memory spike.

Thank you!

On Fri, Jul 19, 2019 at 2:09 PM Lidi Zheng  wrote:

> Internally, we are running ASAN test on Python tests.
>
> If you are using Bazel, it would be as simple as --config=ASAN.
> If not... then it can be challenging indeed.
>
> The test is about 'GRPC_ARG_PER_RPC_RETRY_BUFFER_SIZE' which seems not
> directly related to your case.
> If the buffer is not consumed in chttp2 parser, I don't think it will do
> another round of tcp_read for that channel.
>
> Can this memory leak be observed in simpler cases?
>
>
>
> On Fri, Jul 19, 2019 at 1:03 PM Yonatan Zunger  wrote:
>
>> I have no idea what would be involved in attaching ASAN to Python, and
>> suspect it may be "exciting," so I'm trying to see first if gRPC has any
>> monitoring capability around its buffers.
>>
>> One thing I did notice while reading through the codebase was unittests
>> like this one
>> 
>>  about
>> exceeding buffer sizes -- that does seem to trigger an ABORTED response,
>> but the test was fairly hard to understand (not much commenting there...).
>> Am I right in thinking that if this 4MB buffer is overflowed, that's
>> somehow going to happen?
>>
>> On Fri, Jul 19, 2019 at 12:59 PM Lidi Zheng  wrote:
>>
>>> Hi Yonatan,
>>>
>>> In gRPC Python side, the consumption of message is sequential, and won't
>>> be kept in memory.
>>> If you recall the batch operations, only if a message is sent to
>>> application, will gRPC Python start another RECV_MESSAGE operation.
>>> It's unlikely that the problem resided in Python space.
>>>
>>> In C-Core space, AFAIK for each TCP read, the size is 4MiB
>>> 
>>>  per
>>> channel.
>>> I think we have flow control both in TCP level and HTTP2 level.
>>>
>>> For debugging, did you try to use ASAN? For channel arg, I can only find
>>> "GRPC_ARG_TCP_READ_CHUNK_SIZE" and "GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH"
>>> that might be related to your case.
>>>
>>> Lidi Zheng
>>>
>>> On Fri, Jul 19, 2019 at 12:48 PM Yonatan Zunger  wrote:
>>>
 Maybe a more concrete way of asking this question: Let's say we have a
 Python gRPC client making a response-streaming request to some gRPC server.
 The server starts to stream back responses. If the client fails to consume
 data as fast as the server generates it, I'm trying to figure out where the
 data would accumulate, and which memory allocator it would be using.
 (Because Python heap profiling won't see calls to malloc())

 If I'm understanding correctly:

 * The responses are written by the server to the network socket at the
 server's own speed (no pushback controlling it);
 * These get picked up by the kernel network device on the client, and
 get pulled into userspace ASAP by the event loop, which is in the C layer
 of the gRPC client. This is stored in a grpc_byte_buffer and builds up
 there.
 * The Python client library exposes a response iterator, which is
 ultimately a _Rendezvous object; its iteration is implemented in
 _Rendezvous._next(), which calls cygrpc.ReceiveMessageOperation, which is
 what drains data from the grpc_byte_buffer and passes it to the protobuf
 parser, which creates objects in the Python memory address space and
 returns them to the caller.

 This means that if the client were to drain the iterator more slowly,
 data would accumulate in the grpc_byte_buffer, which is in the C layer and
 not visible to (e.g.) Python heap profiling using the PEP445 malloc hooks.

 If I am understanding this correctly, is there any way (without doing a
 massive amount of plumbing) to monitor the state of the byte buffer, e.g.
 with some gRPC debug parameter? And is there any mechanism in the C layer
 which limits the size of this buffer, doing something like failing the RPC
 if the buffer size exceeds some threshold?

 Yonatan

 On Thu, Jul 18, 2019 at 5:27 PM Yonatan Zunger  wrote:

> Hi everyone,
>
> I'm trying to debug a mysterious memory blowout in a Python batch job,
> and one of the angles I'm exploring is that this may have to do with the
> way it's reading data. This job is reading from bigtable, which is
> ultimately fetching the actual data with a unidirectional streaming "read
> rows" RPC. This takes a single request and returns a sequence of data
> chunks, the higher-level client 

Re: [grpc-io] Re: Pushback in unidirectional streaming RPC's

2019-07-19 Thread 'Lidi Zheng' via grpc.io
Internally, we are running ASAN test on Python tests.

If you are using Bazel, it would be as simple as --config=ASAN.
If not... then it can be challenging indeed.

The test is about 'GRPC_ARG_PER_RPC_RETRY_BUFFER_SIZE' which seems not
directly related to your case.
If the buffer is not consumed in chttp2 parser, I don't think it will do
another round of tcp_read for that channel.

Can this memory leak be observed in simpler cases?



On Fri, Jul 19, 2019 at 1:03 PM Yonatan Zunger  wrote:

> I have no idea what would be involved in attaching ASAN to Python, and
> suspect it may be "exciting," so I'm trying to see first if gRPC has any
> monitoring capability around its buffers.
>
> One thing I did notice while reading through the codebase was unittests
> like this one
> 
>  about
> exceeding buffer sizes -- that does seem to trigger an ABORTED response,
> but the test was fairly hard to understand (not much commenting there...).
> Am I right in thinking that if this 4MB buffer is overflowed, that's
> somehow going to happen?
>
> On Fri, Jul 19, 2019 at 12:59 PM Lidi Zheng  wrote:
>
>> Hi Yonatan,
>>
>> In gRPC Python side, the consumption of message is sequential, and won't
>> be kept in memory.
>> If you recall the batch operations, only if a message is sent to
>> application, will gRPC Python start another RECV_MESSAGE operation.
>> It's unlikely that the problem resided in Python space.
>>
>> In C-Core space, AFAIK for each TCP read, the size is 4MiB
>> 
>>  per
>> channel.
>> I think we have flow control both in TCP level and HTTP2 level.
>>
>> For debugging, did you try to use ASAN? For channel arg, I can only find
>> "GRPC_ARG_TCP_READ_CHUNK_SIZE" and "GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH"
>> that might be related to your case.
>>
>> Lidi Zheng
>>
>> On Fri, Jul 19, 2019 at 12:48 PM Yonatan Zunger  wrote:
>>
>>> Maybe a more concrete way of asking this question: Let's say we have a
>>> Python gRPC client making a response-streaming request to some gRPC server.
>>> The server starts to stream back responses. If the client fails to consume
>>> data as fast as the server generates it, I'm trying to figure out where the
>>> data would accumulate, and which memory allocator it would be using.
>>> (Because Python heap profiling won't see calls to malloc())
>>>
>>> If I'm understanding correctly:
>>>
>>> * The responses are written by the server to the network socket at the
>>> server's own speed (no pushback controlling it);
>>> * These get picked up by the kernel network device on the client, and
>>> get pulled into userspace ASAP by the event loop, which is in the C layer
>>> of the gRPC client. This is stored in a grpc_byte_buffer and builds up
>>> there.
>>> * The Python client library exposes a response iterator, which is
>>> ultimately a _Rendezvous object; its iteration is implemented in
>>> _Rendezvous._next(), which calls cygrpc.ReceiveMessageOperation, which is
>>> what drains data from the grpc_byte_buffer and passes it to the protobuf
>>> parser, which creates objects in the Python memory address space and
>>> returns them to the caller.
>>>
>>> This means that if the client were to drain the iterator more slowly,
>>> data would accumulate in the grpc_byte_buffer, which is in the C layer and
>>> not visible to (e.g.) Python heap profiling using the PEP445 malloc hooks.
>>>
>>> If I am understanding this correctly, is there any way (without doing a
>>> massive amount of plumbing) to monitor the state of the byte buffer, e.g.
>>> with some gRPC debug parameter? And is there any mechanism in the C layer
>>> which limits the size of this buffer, doing something like failing the RPC
>>> if the buffer size exceeds some threshold?
>>>
>>> Yonatan
>>>
>>> On Thu, Jul 18, 2019 at 5:27 PM Yonatan Zunger  wrote:
>>>
 Hi everyone,

 I'm trying to debug a mysterious memory blowout in a Python batch job,
 and one of the angles I'm exploring is that this may have to do with the
 way it's reading data. This job is reading from bigtable, which is
 ultimately fetching the actual data with a unidirectional streaming "read
 rows" RPC. This takes a single request and returns a sequence of data
 chunks, the higher-level client reshapes this into an iterator over the
 individual data cells, and those are consumed by the higher-level program,
 so that the next response proto is consumed once the program is ready to
 parse it.

 Something I can't remember about gRPC internals: What, if anything, is
 the pushback mechanism in unidirectional streaming? In the zero-pushback
 case, it would seem that a server could yield results at any speed, which
 would be accepted by the client and stored in gRPC's internal buffers until
 it got read by the client code, which 

Re: [grpc-io] Re: Pushback in unidirectional streaming RPC's

2019-07-19 Thread 'Srini Polavarapu' via grpc.io
Enabling flowctl debug tracing might show some useful log when, say, client
is not at all consuming while server keeps generating.
https://github.com/grpc/grpc/blob/master/doc/environment_variables.md



On Fri, Jul 19, 2019 at 1:03 PM Yonatan Zunger  wrote:

> I have no idea what would be involved in attaching ASAN to Python, and
> suspect it may be "exciting," so I'm trying to see first if gRPC has any
> monitoring capability around its buffers.
>
> One thing I did notice while reading through the codebase was unittests
> like this one
> 
>  about
> exceeding buffer sizes -- that does seem to trigger an ABORTED response,
> but the test was fairly hard to understand (not much commenting there...).
> Am I right in thinking that if this 4MB buffer is overflowed, that's
> somehow going to happen?
>
> On Fri, Jul 19, 2019 at 12:59 PM Lidi Zheng  wrote:
>
>> Hi Yonatan,
>>
>> In gRPC Python side, the consumption of message is sequential, and won't
>> be kept in memory.
>> If you recall the batch operations, only if a message is sent to
>> application, will gRPC Python start another RECV_MESSAGE operation.
>> It's unlikely that the problem resided in Python space.
>>
>> In C-Core space, AFAIK for each TCP read, the size is 4MiB
>> 
>>  per
>> channel.
>> I think we have flow control both in TCP level and HTTP2 level.
>>
>> For debugging, did you try to use ASAN? For channel arg, I can only find
>> "GRPC_ARG_TCP_READ_CHUNK_SIZE" and "GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH"
>> that might be related to your case.
>>
>> Lidi Zheng
>>
>> On Fri, Jul 19, 2019 at 12:48 PM Yonatan Zunger  wrote:
>>
>>> Maybe a more concrete way of asking this question: Let's say we have a
>>> Python gRPC client making a response-streaming request to some gRPC server.
>>> The server starts to stream back responses. If the client fails to consume
>>> data as fast as the server generates it, I'm trying to figure out where the
>>> data would accumulate, and which memory allocator it would be using.
>>> (Because Python heap profiling won't see calls to malloc())
>>>
>>> If I'm understanding correctly:
>>>
>>> * The responses are written by the server to the network socket at the
>>> server's own speed (no pushback controlling it);
>>> * These get picked up by the kernel network device on the client, and
>>> get pulled into userspace ASAP by the event loop, which is in the C layer
>>> of the gRPC client. This is stored in a grpc_byte_buffer and builds up
>>> there.
>>> * The Python client library exposes a response iterator, which is
>>> ultimately a _Rendezvous object; its iteration is implemented in
>>> _Rendezvous._next(), which calls cygrpc.ReceiveMessageOperation, which is
>>> what drains data from the grpc_byte_buffer and passes it to the protobuf
>>> parser, which creates objects in the Python memory address space and
>>> returns them to the caller.
>>>
>>> This means that if the client were to drain the iterator more slowly,
>>> data would accumulate in the grpc_byte_buffer, which is in the C layer and
>>> not visible to (e.g.) Python heap profiling using the PEP445 malloc hooks.
>>>
>>> If I am understanding this correctly, is there any way (without doing a
>>> massive amount of plumbing) to monitor the state of the byte buffer, e.g.
>>> with some gRPC debug parameter? And is there any mechanism in the C layer
>>> which limits the size of this buffer, doing something like failing the RPC
>>> if the buffer size exceeds some threshold?
>>>
>>> Yonatan
>>>
>>> On Thu, Jul 18, 2019 at 5:27 PM Yonatan Zunger  wrote:
>>>
 Hi everyone,

 I'm trying to debug a mysterious memory blowout in a Python batch job,
 and one of the angles I'm exploring is that this may have to do with the
 way it's reading data. This job is reading from bigtable, which is
 ultimately fetching the actual data with a unidirectional streaming "read
 rows" RPC. This takes a single request and returns a sequence of data
 chunks, the higher-level client reshapes this into an iterator over the
 individual data cells, and those are consumed by the higher-level program,
 so that the next response proto is consumed once the program is ready to
 parse it.

 Something I can't remember about gRPC internals: What, if anything, is
 the pushback mechanism in unidirectional streaming? In the zero-pushback
 case, it would seem that a server could yield results at any speed, which
 would be accepted by the client and stored in gRPC's internal buffers until
 it got read by the client code, which could potentially cause a large
 memory blowout if the server wrote faster than the client read. Is this in
 fact the case? If so, is there any good way to instrument and detect if
 it's happening? (Some combination of gRPC 

Re: [grpc-io] Re: Pushback in unidirectional streaming RPC's

2019-07-19 Thread Yonatan Zunger
I have no idea what would be involved in attaching ASAN to Python, and
suspect it may be "exciting," so I'm trying to see first if gRPC has any
monitoring capability around its buffers.

One thing I did notice while reading through the codebase was unittests
like this one

about
exceeding buffer sizes -- that does seem to trigger an ABORTED response,
but the test was fairly hard to understand (not much commenting there...).
Am I right in thinking that if this 4MB buffer is overflowed, that's
somehow going to happen?

On Fri, Jul 19, 2019 at 12:59 PM Lidi Zheng  wrote:

> Hi Yonatan,
>
> In gRPC Python side, the consumption of message is sequential, and won't
> be kept in memory.
> If you recall the batch operations, only if a message is sent to
> application, will gRPC Python start another RECV_MESSAGE operation.
> It's unlikely that the problem resided in Python space.
>
> In C-Core space, AFAIK for each TCP read, the size is 4MiB
> 
>  per
> channel.
> I think we have flow control both in TCP level and HTTP2 level.
>
> For debugging, did you try to use ASAN? For channel arg, I can only find
> "GRPC_ARG_TCP_READ_CHUNK_SIZE" and "GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH"
> that might be related to your case.
>
> Lidi Zheng
>
> On Fri, Jul 19, 2019 at 12:48 PM Yonatan Zunger  wrote:
>
>> Maybe a more concrete way of asking this question: Let's say we have a
>> Python gRPC client making a response-streaming request to some gRPC server.
>> The server starts to stream back responses. If the client fails to consume
>> data as fast as the server generates it, I'm trying to figure out where the
>> data would accumulate, and which memory allocator it would be using.
>> (Because Python heap profiling won't see calls to malloc())
>>
>> If I'm understanding correctly:
>>
>> * The responses are written by the server to the network socket at the
>> server's own speed (no pushback controlling it);
>> * These get picked up by the kernel network device on the client, and get
>> pulled into userspace ASAP by the event loop, which is in the C layer of
>> the gRPC client. This is stored in a grpc_byte_buffer and builds up there.
>> * The Python client library exposes a response iterator, which is
>> ultimately a _Rendezvous object; its iteration is implemented in
>> _Rendezvous._next(), which calls cygrpc.ReceiveMessageOperation, which is
>> what drains data from the grpc_byte_buffer and passes it to the protobuf
>> parser, which creates objects in the Python memory address space and
>> returns them to the caller.
>>
>> This means that if the client were to drain the iterator more slowly,
>> data would accumulate in the grpc_byte_buffer, which is in the C layer and
>> not visible to (e.g.) Python heap profiling using the PEP445 malloc hooks.
>>
>> If I am understanding this correctly, is there any way (without doing a
>> massive amount of plumbing) to monitor the state of the byte buffer, e.g.
>> with some gRPC debug parameter? And is there any mechanism in the C layer
>> which limits the size of this buffer, doing something like failing the RPC
>> if the buffer size exceeds some threshold?
>>
>> Yonatan
>>
>> On Thu, Jul 18, 2019 at 5:27 PM Yonatan Zunger  wrote:
>>
>>> Hi everyone,
>>>
>>> I'm trying to debug a mysterious memory blowout in a Python batch job,
>>> and one of the angles I'm exploring is that this may have to do with the
>>> way it's reading data. This job is reading from bigtable, which is
>>> ultimately fetching the actual data with a unidirectional streaming "read
>>> rows" RPC. This takes a single request and returns a sequence of data
>>> chunks, the higher-level client reshapes this into an iterator over the
>>> individual data cells, and those are consumed by the higher-level program,
>>> so that the next response proto is consumed once the program is ready to
>>> parse it.
>>>
>>> Something I can't remember about gRPC internals: What, if anything, is
>>> the pushback mechanism in unidirectional streaming? In the zero-pushback
>>> case, it would seem that a server could yield results at any speed, which
>>> would be accepted by the client and stored in gRPC's internal buffers until
>>> it got read by the client code, which could potentially cause a large
>>> memory blowout if the server wrote faster than the client read. Is this in
>>> fact the case? If so, is there any good way to instrument and detect if
>>> it's happening? (Some combination of gRPC debug flags, perhaps) If not, is
>>> there some pushback mechanism I'm not thinking of?
>>>
>>> (Alas, I can't change the protocol in this situation; the server is run
>>> by someone else)
>>>
>>> Yonatan
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "grpc.io" group.
>> To unsubscribe from this group and stop receiving 

Re: [grpc-io] Re: Pushback in unidirectional streaming RPC's

2019-07-19 Thread 'Lidi Zheng' via grpc.io
Hi Yonatan,

In gRPC Python side, the consumption of message is sequential, and won't be
kept in memory.
If you recall the batch operations, only if a message is sent to
application, will gRPC Python start another RECV_MESSAGE operation.
It's unlikely that the problem resided in Python space.

In C-Core space, AFAIK for each TCP read, the size is 4MiB

per
channel.
I think we have flow control both in TCP level and HTTP2 level.

For debugging, did you try to use ASAN? For channel arg, I can only find
"GRPC_ARG_TCP_READ_CHUNK_SIZE" and "GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH"
that might be related to your case.

Lidi Zheng

On Fri, Jul 19, 2019 at 12:48 PM Yonatan Zunger  wrote:

> Maybe a more concrete way of asking this question: Let's say we have a
> Python gRPC client making a response-streaming request to some gRPC server.
> The server starts to stream back responses. If the client fails to consume
> data as fast as the server generates it, I'm trying to figure out where the
> data would accumulate, and which memory allocator it would be using.
> (Because Python heap profiling won't see calls to malloc())
>
> If I'm understanding correctly:
>
> * The responses are written by the server to the network socket at the
> server's own speed (no pushback controlling it);
> * These get picked up by the kernel network device on the client, and get
> pulled into userspace ASAP by the event loop, which is in the C layer of
> the gRPC client. This is stored in a grpc_byte_buffer and builds up there.
> * The Python client library exposes a response iterator, which is
> ultimately a _Rendezvous object; its iteration is implemented in
> _Rendezvous._next(), which calls cygrpc.ReceiveMessageOperation, which is
> what drains data from the grpc_byte_buffer and passes it to the protobuf
> parser, which creates objects in the Python memory address space and
> returns them to the caller.
>
> This means that if the client were to drain the iterator more slowly, data
> would accumulate in the grpc_byte_buffer, which is in the C layer and not
> visible to (e.g.) Python heap profiling using the PEP445 malloc hooks.
>
> If I am understanding this correctly, is there any way (without doing a
> massive amount of plumbing) to monitor the state of the byte buffer, e.g.
> with some gRPC debug parameter? And is there any mechanism in the C layer
> which limits the size of this buffer, doing something like failing the RPC
> if the buffer size exceeds some threshold?
>
> Yonatan
>
> On Thu, Jul 18, 2019 at 5:27 PM Yonatan Zunger  wrote:
>
>> Hi everyone,
>>
>> I'm trying to debug a mysterious memory blowout in a Python batch job,
>> and one of the angles I'm exploring is that this may have to do with the
>> way it's reading data. This job is reading from bigtable, which is
>> ultimately fetching the actual data with a unidirectional streaming "read
>> rows" RPC. This takes a single request and returns a sequence of data
>> chunks, the higher-level client reshapes this into an iterator over the
>> individual data cells, and those are consumed by the higher-level program,
>> so that the next response proto is consumed once the program is ready to
>> parse it.
>>
>> Something I can't remember about gRPC internals: What, if anything, is
>> the pushback mechanism in unidirectional streaming? In the zero-pushback
>> case, it would seem that a server could yield results at any speed, which
>> would be accepted by the client and stored in gRPC's internal buffers until
>> it got read by the client code, which could potentially cause a large
>> memory blowout if the server wrote faster than the client read. Is this in
>> fact the case? If so, is there any good way to instrument and detect if
>> it's happening? (Some combination of gRPC debug flags, perhaps) If not, is
>> there some pushback mechanism I'm not thinking of?
>>
>> (Alas, I can't change the protocol in this situation; the server is run
>> by someone else)
>>
>> Yonatan
>>
> --
> You received this message because you are subscribed to the Google Groups "
> grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to grpc-io+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/grpc-io/CAFk%3DnbT16yfxQ_%2BUkudCAkaADECw-XRbqvtC4u%3DbaEQ_Rv9VAA%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAMC1%3DjdsOi9prtxQSrphufdbXd%2BkKwzfMw9b%2BMLH1OfsbwKFEg%40mail.gmail.com.