Hello, I have a client(A)/server(B) program backed by GRPC 1.37.

Here we are using BlockingUnaryCall mainly in a multithread environment and 
no timeout set yet from client side.

And we find that after several calls(with different RPC methods) , there is 
a possibility that some pending RPCs never return and the callstack like 
below:

Thread 31 (Thread 0x7ff695ffb700 (LWP 3073) "grpcpp_sync_ser"):
#0  0x00007ff707ee87f9 in syscall () from /lib64/libc.so.6
#1  0x00007ff708509987 in 
absl::lts_20230802::synchronization_internal::FutexWaiter::WaitUntil(std::atomic<int>*,
 
int, absl::lts_20230802::synchronization_internal::KernelTimeout) () from 
/usr/lib64/libabsl_synchronization.so.2308.0.0
#2  0x00007ff708509a6a in 
absl::lts_20230802::synchronization_internal::FutexWaiter::Wait(absl::lts_20230802::synchronization_internal::KernelTimeout)
 
() from /usr/lib64/libabsl_synchronization.so.2308.0.0
#3  0x00007ff708509c71 in AbslInternalPerThreadSemWait_lts_20230802 () from 
/usr/lib64/libabsl_synchronization.so.2308.0.0
#4  0x00007ff70850bbd3 in 
absl::lts_20230802::Mutex::Block(absl::lts_20230802::base_internal::PerThreadSynch*)
 
() from /usr/lib64/libabsl_synchronization.so.2308.0.0
#5  0x00007ff70850c776 in 
absl::lts_20230802::Mutex::LockSlowLoop(absl::lts_20230802::SynchWaitParams*, 
int) () from /usr/lib64/libabsl_synchronization.so.2308.0.0
#6  0x00007ff70850cdac in 
absl::lts_20230802::Mutex::LockSlowWithDeadline(absl::lts_20230802::MuHowS 
const*, absl::lts_20230802::Condition const*, 
absl::lts_20230802::synchronization_internal::KernelTimeout, int) () from 
/usr/lib64/libabsl_synchronization.so.2308.0.0
#7  0x00007ff70850934a in 
absl::lts_20230802::Mutex::LockSlow(absl::lts_20230802::MuHowS const*, 
absl::lts_20230802::Condition const*, int) () from 
/usr/lib64/libabsl_synchronization.so.2308.0.0
#8  0x00007ff708a3c216 in ?? () from /usr/lib64/libgrpc.so.37
#9  0x00007ff708a3fef5 in ?? () from /usr/lib64/libgrpc.so.37
#10 0x00007ff708a49265 in grpc_pollset_work(grpc_pollset*, 
grpc_pollset_worker**, grpc_core::Timestamp) () from 
/usr/lib64/libgrpc.so.37
#11 0x00007ff708b5215e in ?? () from /usr/lib64/libgrpc.so.37
#12 0x00007ff709067a5d in grpc::CompletionQueue::Pluck 
(this=0x7ff695ff9b60, tag=0x7ff695ff9ba0) at 
/usr/include/grpcpp/completion_queue.h:322
#13 0x00007ff709071bce in 
grpc::internal::BlockingUnaryCallImpl<google::protobuf::MessageLite, 
google::protobuf::MessageLite>::BlockingUnaryCallImpl (this=0x7ff695ff9f00, 
channel=0xb14cc0, method=..., context=0x7ff695ffa030, request=..., 
result=0x7ff695ffa210) at /usr/include/grpcpp/impl/client_unary_call.h:80
#14 0x00007ff70906ed76 in 
grpc::internal::BlockingUnaryCall<bam_grpc::bam_get_prealloc_chunks_args, 
bam_grpc::bam_get_prealloc_chunks_res, google::protobuf::MessageLite, 
google::protobuf::MessageLite> (channel=0xb14cc0, method=..., 
context=0x7ff695ffa030, request=..., result=0x7ff695ffa210) at 
/usr/include/grpcpp/impl/client_unary_call.h:51
......

However we can see the underlying socket as data needs to be drained:

e5b2384394a9:/ # ss -antp | grep 49495
LISTEN     0      4096   [::ffff:127.0.0.46]:49495                   *:*   
  users:(("B",pid=3003,fd=11))
*CLOSE-WAIT 397    0       [::ffff:127.0.0.1]:58096 
[::ffff:127.0.0.46]:49495 users:(("A",pid=3042,fd=8))*

There are 397 bytes in the RECV-Q but never get a chance to be read. (It's 
in CLOSE-WAIT due to B has close the connection.)

Can anyone help how to further debugging? Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/grpc-io/a691b2aa-8367-4b9f-b03e-1ec1847e6ab5n%40googlegroups.com.

Reply via email to