[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-12-08 Thread Fangmin Lv (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1628#comment-1628
 ] 

Fangmin Lv commented on ZOOKEEPER-2845:
---

[~davelatham] I meant the broken "retainDB" commit in ZOOKEEPER-2678, we should 
revert it before we have a sound solution.

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-12-08 Thread Dave Latham (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284144#comment-16284144
 ] 

Dave Latham commented on ZOOKEEPER-2845:


Thanks, [~lvfangmin].  The broken "retainDB" commit is ZOOKEEPER-2845 right?  
You're suggesting that be reverted?

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-12-08 Thread Fangmin Lv (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284068#comment-16284068
 ] 

Fangmin Lv commented on ZOOKEEPER-2845:
---

Can someone help add my teammate jtuple as the contributor? So I can assign the 
task to him.

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-12-08 Thread Fangmin Lv (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fangmin Lv reassigned ZOOKEEPER-2845:
-

Assignee: (was: Fangmin Lv)

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-12-08 Thread Fangmin Lv (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284066#comment-16284066
 ] 

Fangmin Lv commented on ZOOKEEPER-2845:
---

[~davelatham] our internal patch is based on 3.6 branch, and we found it 
amplified the issue reported in ZOOKEEPER-2926, on our production we need to 
disable the local session feature to mitigate the issue. Also, we haven't 
patched and tested the diff on 3.4 yet, so we're not confident to get it out 
yet. Instead, I would suggest to revert the existing broken retainDB commit to 
unblock the next release. 

I have made a patch for ZOOKEEPER-2926, will update it there. And assign this 
Jira to my teammate Joseph to follow up, he is the owner of our internal 
retainDB feature.



> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-12-08 Thread Dave Latham (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283999#comment-16283999
 ] 

Dave Latham commented on ZOOKEEPER-2845:


Any updates here?  We were considering upgrading our zookeeper, but don't want 
to go to a release with a known data inconsistency problem.

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2924) Flaky Test: org.apache.zookeeper.test.LoadFromLogTest.testRestoreWithTransactionErrors

2017-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283518#comment-16283518
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2924:
---

Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/409
  
@phunt Patch has been updated with the trunk review changes. Please take a 
look.


> Flaky Test: 
> org.apache.zookeeper.test.LoadFromLogTest.testRestoreWithTransactionErrors
> --
>
> Key: ZOOKEEPER-2924
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2924
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server, tests
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>  Labels: flaky, flaky-test
> Fix For: 3.4.12
>
>
> From https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1682/
> Same issue happens in jdk8 and jdk9 builds as well.
> Issue has already been fixed by 
> https://issues.apache.org/jira/browse/ZOOKEEPER-2484 , but I believe that the 
> root cause here is that test startup / cleanup code is included in the tests 
> instead of using try-finally block or Before-After methods.
> As a consequence, when exception happens during test execution, ZK test 
> server doesn't get shutdown properly and still listening on the port bound to 
> the test class.
> As mentioned above there could be 2 approaches to address this:
> #1 Wrap cleanup code block with finally
> #2 Use JUnit's Before-After methods for initialization and cleanup
> Test where original issue happens:
> {noformat}
> ...   
>  [junit] 2017-10-12 15:05:20,135 [myid:] - INFO  [ProcessThread(sid:0 
> cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when 
> processing sessionid:0x104cd7b190c type:create cxid:0x8c zxid:0x8d 
> txntype:-1 req$
>  [junit] 2017-10-12 15:05:20,137 [myid:] - INFO  [ProcessThread(sid:0 
> cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when 
> processing sessionid:0x104cd7b190c type:create cxid:0x8d zxid:0x8e 
> txntype:-1 req$
>  [junit] 2017-10-12 15:05:20,139 [myid:] - INFO  [ProcessThread(sid:0 
> cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when 
> processing sessionid:0x104cd7b190c type:create cxid:0x8e zxid:0x8f 
> txntype:-1 req$
>  [junit] 2017-10-12 15:05:20,142 [myid:] - INFO  [ProcessThread(sid:0 
> cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when 
> processing sessionid:0x104cd7b190c type:create cxid:0x8f zxid:0x90 
> txntype:-1 req$
>  [junit] 2017-10-12 15:05:20,144 [myid:] - INFO  [ProcessThread(sid:0 
> cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when 
> processing sessionid:0x104cd7b190c type:create cxid:0x90 zxid:0x91 
> txntype:-1 req$
>  [junit] 2017-10-12 15:05:30,479 [myid:] - INFO  
> [SessionTracker:ZooKeeperServer@354] - Expiring session 0x104cd7b190c, 
> timeout of 6000ms exceeded
>  [junit] 2017-10-12 15:05:32,996 [myid:] - INFO  [ProcessThread(sid:0 
> cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when 
> processing sessionid:0x104cd7b190c type:ping cxid:0xfffe 
> zxid:0xf$
>  [junit] 2017-10-12 15:05:24,147 [myid:] - WARN  
> [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@] - Client 
> session timed out, have not heard from server in 4002ms for sessionid 
> 0x104cd7b190c
>  [junit] 2017-10-12 15:05:32,996 [myid:] - INFO  
> [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1159] - Client 
> session timed out, have not heard from server in 4002ms for sessionid 
> 0x104cd7b190c, closing socket connectio$
>  [junit] 2017-10-12 15:05:21,479 [myid:] - INFO  
> [SessionTracker:SessionTrackerImpl@163] - SessionTrackerImpl exited loop!
>  [junit] 2017-10-12 15:05:32,998 [myid:] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@376] - Unable to 
> read additional data from client sessionid 0x104cd7b190c, likely client 
> has closed socket
>  [junit] 2017-10-12 15:05:33,067 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1040] - Closed 
> socket connection for client /127.0.0.1:45735 which had sessionid 
> 0x104cd7b190c
>  [junit] 2017-10-12 15:05:32,996 [myid:] - INFO  [ProcessThread(sid:0 
> cport:11221)::PrepRequestProcessor@487] - Processed session termination for 
> sessionid: 0x104cd7b190c
>  [junit] 2017-10-12 15:05:33,889 [myid:] - INFO  [main:ZooKeeper@687] - 
> Session: 0x104cd7b190c closed
>  [junit] 2017-10-12 15:05:33,890 [myid:] - INFO  
> [main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down for 
> session: 0x104cd7b190c
>  [junit] 2017-10-12 15:05:33,891 

[GitHub] zookeeper issue #409: ZOOKEEPER-2924: Refactor tests of LoadFromLogTest.java

2017-12-08 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/409
  
@phunt Patch has been updated with the trunk review changes. Please take a 
look.


---


[jira] [Assigned] (ZOOKEEPER-1422) Support _HOST substitution in JAAS configuration

2017-12-08 Thread Tamas Penzes (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Penzes reassigned ZOOKEEPER-1422:
---

Assignee: Mark Fenes

> Support _HOST substitution in JAAS configuration 
> -
>
> Key: ZOOKEEPER-1422
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1422
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Thomas Weise
>Assignee: Mark Fenes
>
> At the moment a JAAS configuration file needs to be created with the Kerberos 
> principal specified as user/host. It would be much easier for deployment 
> automation if the host portion could be resolved at startup time, as 
> supported in Hadoop (something like user/_HOST instead of user/hostname). A 
> configuration alternative to global JAAS conf would be even better (via 
> direct properties in zoo.cfg?).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2928) pthread_join hang at zookeeper_close

2017-12-08 Thread xiaomingzhongguo (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283440#comment-16283440
 ] 

xiaomingzhongguo commented on ZOOKEEPER-2928:
-

This is a kernel bug or a program code bug?

> pthread_join  hang at zookeeper_close
> -
>
> Key: ZOOKEEPER-2928
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2928
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.6
>Reporter: xiaomingzhongguo
>Priority: Critical
>
> when call zookeeper_close 
> thread hang at pthread_join ,  do_io thread not exist , and do_completion not 
> exit 
> #0  0x2b8e38b6b725 in pthread_join () from /lib64/libpthread.so.0
> #1  0x00cc6b86 in adaptor_finish (zh=0x2ae05240) at 
> src/mt_adaptor.c:285
> #2  0x00cc21f3 in zookeeper_close (zh=0x2ae05240) at 
> src/zookeeper.c:2493
> #3  0x008eeb04 in ZkAPI::ZkClose ()
> #4  0x009270b1 in AgentInfo::zkCloseConnection ()
> #5  0x00929e02 in AgentInfo::timeSyncHandler ()
> #6  0x010f0585 in event_base_loop (base=0x1679d00, flags=0) at 
> event.c:1350
> #7  0x00924f31 in AgentInfo::run ()
> #8  0x008998bf in gseThread::run_helper ()
> #9  0x00922956 in tos::util_thread_start ()
> #10 0x2b8e38b6a193 in start_thread () from /lib64/libpthread.so.0
> #11 0x2b8e3929ff0d in clone () from /lib64/libc.so.6
> #0  0x2b8e38b6e326 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x00cc70be in do_completion (v=0x2ae05240) at 
> src/mt_adaptor.c:463
> #2  0x2b8e38b6a193 in start_thread () from /lib64/libpthread.so.0
> #3  0x2b8e3929ff0d in clone () from /lib64/libc.so.6
> #4  0x in ?? ()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2928) pthread_join hang at zookeeper_close

2017-12-08 Thread xiaomingzhongguo (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283437#comment-16283437
 ] 

xiaomingzhongguo commented on ZOOKEEPER-2928:
-

The Suse10 SP1 environment can easily trigger this bug

> pthread_join  hang at zookeeper_close
> -
>
> Key: ZOOKEEPER-2928
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2928
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.6
>Reporter: xiaomingzhongguo
>Priority: Critical
>
> when call zookeeper_close 
> thread hang at pthread_join ,  do_io thread not exist , and do_completion not 
> exit 
> #0  0x2b8e38b6b725 in pthread_join () from /lib64/libpthread.so.0
> #1  0x00cc6b86 in adaptor_finish (zh=0x2ae05240) at 
> src/mt_adaptor.c:285
> #2  0x00cc21f3 in zookeeper_close (zh=0x2ae05240) at 
> src/zookeeper.c:2493
> #3  0x008eeb04 in ZkAPI::ZkClose ()
> #4  0x009270b1 in AgentInfo::zkCloseConnection ()
> #5  0x00929e02 in AgentInfo::timeSyncHandler ()
> #6  0x010f0585 in event_base_loop (base=0x1679d00, flags=0) at 
> event.c:1350
> #7  0x00924f31 in AgentInfo::run ()
> #8  0x008998bf in gseThread::run_helper ()
> #9  0x00922956 in tos::util_thread_start ()
> #10 0x2b8e38b6a193 in start_thread () from /lib64/libpthread.so.0
> #11 0x2b8e3929ff0d in clone () from /lib64/libc.so.6
> #0  0x2b8e38b6e326 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x00cc70be in do_completion (v=0x2ae05240) at 
> src/mt_adaptor.c:463
> #2  0x2b8e38b6a193 in start_thread () from /lib64/libpthread.so.0
> #3  0x2b8e3929ff0d in clone () from /lib64/libc.so.6
> #4  0x in ?? ()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)