[ https://issues.apache.org/jira/browse/ZOOKEEPER-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800502#comment-17800502 ]
yangzhenxing edited comment on ZOOKEEPER-2802 at 12/26/23 2:32 PM: ------------------------------------------------------------------- To me the check and break code at the end of the main while loop in do_io() in mt_adapter.c is kind of too crude, it can't gurantee there is no new requests get into the working queues (i.e to_send and sent_request) at the same time. {code:java} if(is_unrecoverable(zh)) break; {code} was (Author: JIRAUSER303522): To me the check and break code at the end of the main while loop in do_io() in mt_adapter.c is kind of too crude, it can't gurantee there is no new requests get into the working queues (i.e to_send and sent_request) at the same time. ``` if(is_unrecoverable(zh)) break; ``` > Zookeeper C client hang @wait_sync_completion > --------------------------------------------- > > Key: ZOOKEEPER-2802 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2802 > Project: ZooKeeper > Issue Type: Bug > Components: c client > Affects Versions: 3.4.6 > Environment: DISTRIB_DESCRIPTION="Ubuntu 14.04.2 LTS" > Reporter: yihao yang > Priority: Critical > Attachments: zookeeper.out.2017.05.31-10.06.23 > > > I was using zookeeper 3.4.6 c client to access one zookeeper server in a VM. > The VM environment is not stable and I get a lot of EXPIRED_SESSION_STATE > events. I will create another session to ZK when I get an expired event. I > also have a read/write lock to protect session read (get/list/... on zk) and > write(connect, close, reconnect zhandle). > The problem is the session got an EXPIRED_SESSION_STATE event and when it > tried to hold the write lock and reconnect the session, it found there is a > thread was holding the read lock (which was operating sync list on zk). See > the stack below: > GDBStack: > Thread 7 (Thread 0x7f838a43a700 (LWP 62845)): > #0 pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > #1 0x0000000000636033 in wait_sync_completion (sc=sc@entry=0x7f8344000af0) > at src/mt_adaptor.c:85 > #2 0x0000000000633248 in zoo_wget_children2_ (zh=<optimized out>, > path=0x7f83440677a8 "/dict/objects/__services/RLS-GSE/_static_nodes", > watcher=0x0, watcherCtx=0x13e6310, strings=0x7f838a4397b0, > stat=0x7f838a4398d0) at src/zookeeper.c:3630 > #3 0x000000000045e6ff in ZooKeeperContext::getChildren (this=0x13e6310, > path=..., children=children@entry=0x7f838a439890, > stat=stat@entry=0x7f838a4398d0) at zookeeper_context.cpp:xxx > This sync list didn't return a ZINVALIDSTAT but hung. Anyone know the problem? -- This message was sent by Atlassian Jira (v8.20.10#820010)