[
https://issues.apache.org/jira/browse/ZOOKEEPER-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152858#comment-13152858
]
helei commented on ZOOKEEPER-981:
---------------------------------
Sorry for not response in time. I saw another problem with this patch applied.
Hang in zookeeper_close() again. here is the stack:
(gdb) bt
#0 0x000000302b80adfb in __lll_mutex_lock_wait () from
/lib64/tls/libpthread.so.0
#1 0x000000302b1307a8 in main_arena () from /lib64/tls/libc.so.6
#2 0x000000302b910230 in stack_used () from /lib64/tls/libpthread.so.0
#3 0x000000302b808dde in pthread_cond_broadcast@@GLIBC_2.3.2 () from
/lib64/tls/libpthread.so.0
#4 0x00000000006b4ce8 in adaptor_finish (zh=0x6902060) at src/mt_adaptor.c:217
#5 0x00000000006b0fd0 in zookeeper_close (zh=0x6902060) at src/zookeeper.c:2297
(gdb) p zh->ref_counter
$5 = 1
(gdb) p zh->close_requested
$6 = 1
(gdb) p *zh
$7 = {fd = 110112576, hostname = 0x6903620 "", addrs = 0x0, addrs_count = 1,
watcher = 0x62e5dc
<doris::meta_register_mgr_t::register_mgr_watcher(_zhandle*, int, int, char
const*, void*)>, last_recv = {tv_sec = 1321510694,
tv_usec = 552835}, last_send = {tv_sec = 1321510694, tv_usec = 552886},
last_ping = {tv_sec = 1321510685, tv_usec = 774869}, next_deadline = {
tv_sec = 1321510704, tv_usec = 547831}, recv_timeout = 30000, input_buffer
= 0x0, to_process = {head = 0x0, last = 0x0, lock = {__m_reserved = 0,
__m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0,
__spinlock = 0}}}, to_send = {head = 0x0, last = 0x0, lock = {
__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 1, __m_lock
= {__status = 0, __spinlock = 0}}}, sent_requests = {head = 0x0, last = 0x0,
cond = {__c_lock = {__status = 1, __spinlock = -1}, __c_waiting = 0x0,
__padding = '\0' <repeats 15 times>, __align = 0}, lock = {__m_reserved = 0,
__m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0,
__spinlock = 0}}}, completions_to_process = {head = 0x2aefbff800,
last = 0x2af0e05f40, cond = {__c_lock = {__status = 592705486850,
__spinlock = -1}, __c_waiting = 0x45,
__padding = "E\000\000\000\000\000\000\000\220\006\000\000\000", __align
= 296352743424}, lock = {__m_reserved = 1, __m_count = 0,
__m_owner = 0x1000026ca, __m_kind = 0, __m_lock = {__status = 0,
__spinlock = 0}}}, connect_index = 0, client_id = {client_id =
86551148676999146,
passwd = "G懵擀\233\213\f闬202筴\002錪\034"}, last_zxid = 82057372,
outstanding_sync = 0, primer_buffer = {buffer = 0x6902290 "", len = 40,
curr_offset = 44, next = 0x0}, primer_storage = {len = 36, protocolVersion
= 0, timeOut = 30000, sessionId = 86551148676999146, passwd_len = 16,
passwd = "G懵擀\233\213\f闬202筴\002錪\034"},
primer_storage_buffer =
"\000\000\000$\000\000\000\000\000\000u0\0013}惜薵闬000\000\000\020G懵擀\233\213\f闬202筴\002錪\034",
state = 0, context = 0x0,
auth_h = {auth = 0x0, lock = {__m_reserved = 0, __m_count = 0, __m_owner =
0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}},
ref_counter = 1, close_requested = 1, adaptor_priv = 0x0, socket_readable =
{tv_sec = 0, tv_usec = 0}, active_node_watchers = 0x6901520,
active_exist_watchers = 0x69015d0, active_child_watchers = 0x6902ef0, chroot
= 0x0}
I think the ref_counter is suposed to be 2 or 3 here. 1 seems not correct.
thanks again
> Hang in zookeeper_close() in the multi-threaded C client
> --------------------------------------------------------
>
> Key: ZOOKEEPER-981
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-981
> Project: ZooKeeper
> Issue Type: Bug
> Components: c client
> Affects Versions: 3.3.2
> Environment: Debian Squeeze, Linux 2.6.32-5, x86_64
> Reporter: Jeremy Stribling
> Assignee: Jeremy Stribling
> Priority: Critical
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-981-v1.patch, ZOOKEEPER-981.tar.gz,
> zookeeper-981.patch
>
>
> I saw a hang once when my C++ application called the zookeeper_close() method
> of the multi-threaded Zookeeper client library. The stack trace of the hung
> thread was the following:
> {quote}
> Thread 8 (Thread 5644):
> #0 0x00007f5d7bb5bbe4 in __lll_lock_wait () from /lib/libpthread.so.0
> #1 0x00007f5d7bb59ad0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #2 0x00007f5d793628f6 in unlock_completion_list (l=0x32b4d68) at
> .../zookeeper/src/c/src/mt_adaptor.c:66
> #3 0x00007f5d79354d4b in free_completions (zh=0x32b4c80, callCompletion=1,
> reason=-116) at .../zookeeper/src/c/src/zookeeper.c:1069
> #4 0x00007f5d79355008 in cleanup_bufs (zh=0x32b4c80, callCompletion=1,
> rc=-116) at .../thirdparty/zookeeper/src/c/src/zookeeper.c:1125
> #5 0x00007f5d79353200 in destroy (zh=0x32b4c80) at
> .../thirdparty/zookeeper/src/c/src/zookeeper.c:366
> #6 0x00007f5d79358e0e in zookeeper_close (zh=0x32b4c80) at
> .../zookeeper/src/c/src/zookeeper.c:2326
> #7 0x00007f5d79356d18 in api_epilog (zh=0x32b4c80, rc=0) at
> .../zookeeper/src/c/src/zookeeper.c:1661
> #8 0x00007f5d79362f2f in adaptor_finish (zh=0x32b4c80) at
> .../zookeeper/src/c/src/mt_adaptor.c:205
> #9 0x00007f5d79358c8c in zookeeper_close (zh=0x32b4c80) at
> .../zookeeper/src/c/src/zookeeper.c:2297
> ...
> {quote}
> The omitted part of the stack trace is entirely within my application, and
> contains no other calls to/from the Zookeeper client. In particular, I am
> not calling zookeeper_close() from within a completion handler or any of the
> library's threads.
> I haven't been able to reproduce this, and when I encountered this I wasn't
> capturing logging from the client library, so unfortunately I don't have any
> more information at this time. But I will update this JIRA if I see it again.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira