Speaking of windows, Michi can you take a look why the windows job has started failing of late? Perhaps an environment change? (you might look at other windows jobs on that box to get an idea)
https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-WinVS2008/ Thanks! Patrick On Fri, Jun 8, 2012 at 10:16 AM, Michi Mutsuzaki <mi...@cs.stanford.edu> wrote: > I think there is a bug in windows port (are you on windows?) that > doesn't set recursive attribute for the to_send mutex. Please open a > jira: > > https://issues.apache.org/jira/browse/ZOOKEEPER > > Thanks! > --Michi > > On Fri, Jun 8, 2012 at 1:00 AM, 乱麻的魅力 <805784...@qq.com> wrote: >> hi dev: >> I now try to use the zookeeper cli (c code version)to connect the >> zookeeper server, but i find only can connect to ZK,but cann't send any cmd >> to ZK, like "ls /". if i send cmd ,then zk-cli goto deadlock at this line >> lock_buffer_list(list) {//LINE 00945 dequeue_buffer() function of >> zookeeper.c}; then i try to locate this case. >> >> i download the zk cli (ver 3.4.3) from >> http://labs.renren.com/apache-mirror/zookeeper/ , buid the project again, >> find bug locate the line 00945 in >> zookeeper-3.4.3.tar.gz\zookeeper-3.4.3\src\c\src\zookeeper.c too. now i >> describe this case below: >> >> 1 if client send cmd to ZKserver, client need call some function to send >> the cmd ,like zoo_awget,send_ping,zoo_aget,etc.., all this function need >> call adaptor_send_queue(zh, 0); then below... >> >> 2 adaptor_send_queue(zh, 0) call flush_send_queue(zh, timeout); >> >> int flush_send_queue(zhandle_t*zh, int timeout) >> { >> int rc= ZOK; >> struct timeval started; >> #ifdef WIN32 >> fd_set pollSet; >> struct timeval wait; >> #endif >> gettimeofday(&started,0); >> // we can't use dequeue_buffer() here because if (non-blocking) >> send_buffer() >> // returns EWOULDBLOCK we'd have to put the buffer back on the queue. >> // we use a recursive lock instead and only dequeue the buffer if a send >> was >> // successful >> lock_buffer_list(&zh->to_send); /*first time lock the buffer, wfs >> 20120608 */ >> while (zh->to_send.head != 0&& zh->state == ZOO_CONNECTED_STATE) { >> if(timeout!=0){ >> int elapsed; >> struct timeval now; >> gettimeofday(&now,0); >> elapsed=calculate_interval(&started,&now); >> if (elapsed>timeout) { >> rc = ZOPERATIONTIMEOUT; >> break; >> } >> #ifdef WIN32 >> wait = get_timeval(timeout-elapsed); >> FD_ZERO(&pollSet); >> FD_SET(zh->fd, &pollSet); >> // Poll the socket >> rc = select((int)(zh->fd)+1, NULL, &pollSet, NULL, &wait); >> #else >> struct pollfd fds; >> fds.fd = zh->fd; >> fds.events = POLLOUT; >> fds.revents = 0; >> rc = poll(&fds, 1, timeout-elapsed); >> #endif >> if (rc<=0) { >> /* timed out or an error or POLLERR */ >> rc = rc==0 ? ZOPERATIONTIMEOUT : ZSYSTEMERROR; >> break; >> } >> } >> rc = send_buffer(zh->fd, zh->to_send.head); >> if(rc==0 && timeout==0){ >> /* send_buffer would block while sending this buffer */ >> rc = ZOK; >> break; >> } >> if (rc < 0) { >> rc = ZCONNECTIONLOSS; >> break; >> } >> // if the buffer has been sent successfully, remove it from the queue >> if (rc > 0) >> remove_buffer(&zh->to_send); /*this function will second time >> lock the buffer with lock under locked status, wfs 20120608 */ >> >> gettimeofday(&zh->last_send, 0); >> rc = ZOK; >> } >> unlock_buffer_list(&zh->to_send); >> return rc; >> } >> >> static int remove_buffer(buffer_head_t *list) >> { >> buffer_list_t *b = dequeue_buffer(list); >> if (!b) { >> return 0; >> } >> free_buffer(b); >> return 1; >> } >> >> static buffer_list_t *dequeue_buffer(buffer_head_t *list) >> { >> buffer_list_t *b; >> lock_buffer_list(list); /*this function second time lock the buffer with >> lock under locked status 20120608 , then will lead the function to deadlock >> at this line; >> >> if i re-write a new function like *dequeue_buffer(buffer_head_t *list) >> and remove_buffer function without lock and unlock to be callback by >> flush_send_queue, then zk-cli can send cmd to the zookkeeper server, clie >> don't deadlock*/ >> >> b = list->head; >> if (b) { >> list->head = b->next; >> if (!list->head) { >> assert(b == list->last); >> list->last = 0; >> } >> } >> unlock_buffer_list(list); >> return b; >> } >> >> i don't known whether I detailly describe this case, and i find old >> version 3.3.3 have this bug too,i think this c source-code maybe never be >> tested or i use wrong way, can you help me clear this case。 >> >> thanks! >> wfs fr china 20120608