[grpc-io] gRPC streams unexpected disconnections

2017-09-18 Thread Amit Waisel


I encountered a weird behavior in gRPC.


*The symptom* - an active RPC stream is signaled as cancelled on server 
side (happens from time to time, I couldn't find any correspondence with 
other events in the environment) although the client is active and the 
stream shouldn't be closed.

It happens for streams initialized as response streams in RPC calls from 
both C++ and NodeJS clients. *It happened on gRPC 1.3.6 and still happens 
on gRPC v1.6.0*.

The problem does not reproduced easily - the system is executed under heavy 
load for many hours until this happens.


In my code, I have 2 main types of streams:


   1. Control stream (C++→C#) - the client initiates an RPC call to the 
   server, which keeps the RPC's response stream opened.
   Those streams are used as *control channels* with the *C++ clients* and 
   are kept open to allow server-to-client requests. When they are closed, 
   both client and server clean up all data related to the connection. So, the 
   control stream is critical to the session.
   The server registers on call cancellation notification:
ServerCallContext context; // Received from RPC call as a parameter
// ...
context.CancellationToken.Register(() => System.Threading.ThreadPool.
   QueueUserWorkItem(async obj => { handle_disconnection(...); }));
   
   The total number of opened control streams (AKA number of connected C++ 
   clients) is ~1200. 
   2. Command stream (NodeJS→C#) - There are many many other streams for 
   server-to-client command response communication, which are kept opened in 
   parallel by the server with *NodeJS clients*. The total number of opened 
   streams is 20K-30K. 

The problem is noticeable when the control streams get disconnected.

*Further investigation of the client (C++) and server (C#) logs of control 
stream disconnection, revealed to following*:


   1. For some reason, the server's cancellation token (the one registered 
   above) is signaled - and the server does its cleanup 
   (`handle_disconnection` which also closes many command streams 
   intentionally). *According to the client, the connection should have 
   remained opened.*
   2. After some time, the client realizes the connection was closed 
   unexpectedly and does its cleanup - throwing the error I posted here 
    
   (NodeJS in that case). *The clients disconnects itself only after the 
   server disconnects the connection and control stream.*

Another note - I set the servers' RequestCallTokensPerCompletionQueue value 
for both C++/NodeJS client interfaces, to 32768 (32K) per completion queue.

I have 2 server interfaces (for node clients and C++ clients, which have 
different API), and 4 completion queues (for 8 cores machine). I don't 
really know if the 4 completion queues are global, or per-server.

*Do you think it might cause those streams to be closed under heavy load*?

 

In any case, my suspicious is on the C# server behavior - the 
CancellationToken is signaled for no apparent reason.

I *didn't* rule out network instability yet - although both clients and 
server are located on the same ESX server with 10-gig virtual adapters 
between them, so this is quite a long-shot.

 

Do you have any idea how to solve this?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/aae6feda-a932-4de5-8519-22c928a36a31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[grpc-io] Re: gRPC C++ connection failure (getaddrinfo) and deadlock

2017-08-09 Thread Amit Waisel
Hi Nicolas,
Thank you for your help.
I agree that this behavior is disturbing. I also believed that getaddrinfo 
should return at some point. But on that case, it never did.
I must say that it happened on one VM machine, and never reproduced on any 
other machine. I haven't encountered this bug since.

As far as I understand the design of gRPC's thread pool, I debugged the 
process and found our that the RPC function called "ClientReader" 
constructor, which issued an async name resolve operation and waited for 
its completion. An arbitrary thread from the pool picked up the resolve 
task, and called (eventually) getaddrinfo. Because this call *never 
returned*, the thread in the thread pool never completed the task, so the 
ClientReader constructor never finished the wait for the async name resolve 
operation.

I hope this clears things up a bit.

On Wednesday, August 9, 2017 at 12:57:56 AM UTC+3, Nicolas Noble wrote:
>
> Having getaddrinfo() not returning is disturbing. While it's true that all 
> of the OS' DNS resolution functions are synchronous, and will block until 
> the OS comes back with a response, it's usually expected that the OS 
> returns *eventually.* Either with an error (such as a timeout), or with 
> some results. Not returning at all isn't a sane nor expected behavior.
>
> Now your phrasing is a bit confusing. Are you saying that the DNS 
> resolution thread is stuck on resolving address ? Or that you think it 
> somehow did, and got the rest of the library confused and stuck ?
>
> On Monday, April 24, 2017 at 3:53:56 AM UTC-7, Amit Waisel wrote:
>>
>> I have a C++ client, that connects to a C# server. The connection is 
>> being made by a RPC function (called 
>> *InitializeStream()*), that sends a single request and receives a stream 
>> of responses from the server. This RPC function is executed with 'max' 
>> timeout (if the server is unavailable, later call to stream->Read() will 
>> return an error. This is good enough for me).
>> I encountered a weird bug, which happened on a VM. (I couldn't reproduce 
>> it on any other machine, but it reproduces easily on that VM). On that 
>> single VM, the 
>> *InitializeStream()* RPC function never returns.
>>
>> Further debugging of this issue reveled the following:
>>
>>1. The main thread (thread #1) is blocked inside InitializeStream(), 
>>in 
>> * grpc_iocp_work()*. The exact line is 
>> * iocp_windows.c@83* - in Windows's 
>> * GetQueuedCompletionStatus()* function.
>>As far as I understand, here we wait for a task completion, for 
>>unlimited timeout (I used the 'max' timeout).
>>  [External Code] 
>>> Test.exe!grpc_iocp_work(grpc_exec_ctx * exec_ctx, gpr_timespec 
>>deadline) Line 83 C
>>  Test.exe!grpc_pollset_work(grpc_exec_ctx * exec_ctx, grpc_pollset * 
>>pollset, grpc_pollset_worker * * worker_hdl, gpr_timespec now, 
>>gpr_timespec deadline) Line 140 C
>>  Test.exe!grpc_completion_queue_pluck(grpc_completion_queue * cc, 
>>void * tag, gpr_timespec deadline, void * reserved) Line 614 C
>>  
>> Test.exe!grpc::CoreCodegen::grpc_completion_queue_pluck(grpc_completion_queue
>>  
>>* cq, void * tag, gpr_timespec deadline, void * reserved) Line 70 C++
>>  Test.exe!grpc::CompletionQueue::Pluck(grpc::CompletionQueueTag * tag
>>) Line 230 C++
>>  Test.exe!grpc::ClientReader::ClientReader>TestRequest>(grpc::ChannelInterface * channel, 
>>const grpc::RpcMethod & method, grpc::ClientContext * context, const 
>>test::InitMessage & request) Line 151 C++
>>  Test.exe!test::testInterface::Stub::InitializeStreamRaw(grpc::
>>ClientContext * context, const test::InitMessage & request) Line 46 C
>>++
>>  Test.exe!test::testInterface::Stub::InitializeStream(grpc::
>>ClientContext * context, const test::InitMessage & request) Line 86 C
>>++
>>  Test.exe!WinMain(HINSTANCE__ * __formal, HINSTANCE__ * __formal, 
>>char * __formal, int __formal) Line 17 C++
>>  [External Code]
>>
>>2. One of gRPC's threads [from the thread pool] (thread #2), called 
>>the function 
>> * do_request_thread()* in 
>> * resolve_address_windows.c@153*, which called 
>> * grpc_blocking_resolve_address()* (blocking function, by its name) that 
>>called 
>> * getaddrinfo()* that never returns!
>>
>> My guess is that thread #1 waits (*GetQueuedCompletionStatus*) for 
>> thread #2's task completion. 
>> *getaddrinfo()* never returns, so 
>> *GetQue

[grpc-io] gRPC C++ connection failure (getaddrinfo) and deadlock

2017-04-24 Thread Amit Waisel
I have a C++ client, that connects to a C# server. The connection is being 
made by a RPC function (called 
*InitializeStream()*), that sends a single request and receives a stream of 
responses from the server. This RPC function is executed with 'max' timeout 
(if the server is unavailable, later call to stream->Read() will return an 
error. This is good enough for me).
I encountered a weird bug, which happened on a VM. (I couldn't reproduce it 
on any other machine, but it reproduces easily on that VM). On that single 
VM, the 
*InitializeStream()* RPC function never returns.

Further debugging of this issue reveled the following:

   1. The main thread (thread #1) is blocked inside InitializeStream(), in 
* grpc_iocp_work()*. The exact line is 
* iocp_windows.c@83* - in Windows's 
* GetQueuedCompletionStatus()* function.
   As far as I understand, here we wait for a task completion, for 
   unlimited timeout (I used the 'max' timeout).
 [External Code] 
   > Test.exe!grpc_iocp_work(grpc_exec_ctx * exec_ctx, gpr_timespec deadline
   ) Line 83 C
 Test.exe!grpc_pollset_work(grpc_exec_ctx * exec_ctx, grpc_pollset * 
   pollset, grpc_pollset_worker * * worker_hdl, gpr_timespec now, 
   gpr_timespec deadline) Line 140 C
 Test.exe!grpc_completion_queue_pluck(grpc_completion_queue * cc, void * 
   tag, gpr_timespec deadline, void * reserved) Line 614 C
 
Test.exe!grpc::CoreCodegen::grpc_completion_queue_pluck(grpc_completion_queue 
   * cq, void * tag, gpr_timespec deadline, void * reserved) Line 70 C++
 Test.exe!grpc::CompletionQueue::Pluck(grpc::CompletionQueueTag * tag) 
   Line 230 C++
 Test.exe!grpc::ClientReader::ClientReader(grpc::ChannelInterface * channel, const 
   grpc::RpcMethod & method, grpc::ClientContext * context, const test::
   InitMessage & request) Line 151 C++
 Test.exe!test::testInterface::Stub::InitializeStreamRaw(grpc::
   ClientContext * context, const test::InitMessage & request) Line 46 C++
 Test.exe!test::testInterface::Stub::InitializeStream(grpc::
   ClientContext * context, const test::InitMessage & request) Line 86 C++
 Test.exe!WinMain(HINSTANCE__ * __formal, HINSTANCE__ * __formal, char * 
   __formal, int __formal) Line 17 C++
 [External Code]
   
   2. One of gRPC's threads [from the thread pool] (thread #2), called the 
   function 
* do_request_thread()* in 
* resolve_address_windows.c@153*, which called 
* grpc_blocking_resolve_address()* (blocking function, by its name) that 
   called 
* getaddrinfo()* that never returns!

My guess is that thread #1 waits (*GetQueuedCompletionStatus*) for thread 
#2's task completion. 
*getaddrinfo()* never returns, so 
*GetQueuedCompletionStatus()* is blocking, and the main thread is stuck.

Have you encountered this error before? Do you have any idea what can I do 
(beside adding a timeout to the function, which I consider as a bypass and 
not a solution).
I use gRPC v1.2.0 for both C++ and C#.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/89b0a161-9830-4635-80a1-ca3214d35861%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[grpc-io] gRPC channel usage

2017-03-15 Thread Amit Waisel
Hi All,
Is it possible to query network usage information (total bytes transferred, 
current transfer rate [bytes per second], etc) from a gRPC channel? Both 
secure and insecure.
(For example, I would like to query how many bytes were sent and received 
on the native SOCKET, including gRPC/protobuf overhead).

Thanks!
Amit

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/e166b9e8-115f-4c64-9668-b12b9ce3e66f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[grpc-io] gRPC compilation for XP (v140_xp)

2016-10-19 Thread Amit Waisel
I am trying to compile the C++ code for XP platform. I know it is not 
officially supported. By changing the _WIN32_WINNT to 0x501 in 
global.props, I get many errors:

   1. Using RTL_RUN_ONCE API which is supported Vista+ only
   2. Support for IPv6 (inet_ntop at 
   grpc/src/core/lib/iomgr/sockaddr_utils.c, etc.) - again, only Vista+
   3. CONDITION_VARIABLE - same problem as INIT_ONCE above.

Do you plan to publish a minimal edition which can run on XP? (for example, 
without IPv6 support)
If not, can you please point me to the possible obstacles, so I can deal 
with those and compile the code successfully?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/88515c96-0179-4576-a87b-a7cb9b241939%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.