[jira] [Updated] (KAFKA-2572) zk connection instability, perhaps precipitated by zk client timeout during rebalance

2015-09-22 Thread John Firth (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Firth updated KAFKA-2572:
--
Attachment: 091115-full.log.zip

> zk connection instability, perhaps precipitated by zk client timeout during 
> rebalance
> -
>
> Key: KAFKA-2572
> URL: https://issues.apache.org/jira/browse/KAFKA-2572
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.8.2.1
> Environment: zk version 3.4.6,
> CentOS 6, 2.6.32-504.1.3.el6.x86_64
>Reporter: John Firth
> Attachments: 090815-digest.log, 090815-full.log, 091115-digest.log, 
> 091115-full.log.zip
>
>
> On two occasions, have seen zk session expiry, followed by a timeout during a 
> consumer rebalance following this expiry, followed by multiple successive zk 
> session expiries. Restarting the process using the zk client resolved the 
> problems. 
> Comparing these with a case in which a new stable zk session was created 
> following a session expiry, the timeout during rebalance is not seen in the 
> successful case.
> This behavior was seen on 09/08 and 09/11 -- the attached 'full' logs show 
> all logs entries minus entries particular to our application. For 09/08, the 
> time span is 2015-09-08T12:52:06.069-04:00 to 2015-09-08T13:14:48.250-04:00; 
> for 11/08, the time span is between 2015-09-11T01:38:17.000-04:00 to 
> 2015-09-11T07:44:47.124-04:00. The digest logs are the result of retaining 
> only error and warning entries, and entries containing any of: "begin 
> rebalancing", "end rebalancing", "timed", and "zookeeper state". For the 
> 09/11 digest logs, entries from the kafka.network.Processor logger are also 
> excised for clarity. Unfortunately, debug logging was not enabled during 
> these events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2572) zk connection instability, perhaps precipitated by zk client timeout during rebalance

2015-09-22 Thread John Firth (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Firth updated KAFKA-2572:
--
Description: 
On two occasions, have seen zk session expiry, followed by a timeout during a 
consumer rebalance following this expiry, followed by multiple successive zk 
session expiries. Restarting the process using the zk client resolved the 
problems. 
Comparing these with a case in which a new stable zk session was created 
following a session expiry, the timeout during rebalance is not seen in the 
successful case.

This behavior was seen on 09/08 and 09/11 -- the attached 'full' logs show all 
logs entries minus entries particular to our application. For 09/08, the time 
span is 2015-09-08T12:52:06.069-04:00 to 2015-09-08T13:14:48.250-04:00; for 
11/08, the time span is between 2015-09-11T01:38:17.000-04:00 to 
2015-09-11T07:44:47.124-04:00. The digest logs are the result of retaining only 
error and warning entries, and entries containing any of: "begin rebalancing", 
"end rebalancing", "timed", and "zookeeper state". For the 09/11 digest logs, 
entries from the kafka.network.Processor logger are also excised for clarity. 
Unfortunately, debug logging was not enabled during these events.




  was:
On two occasions, have seen zk session expiry, followed by a timeout during a 
consumer rebalance following this expiry, followed by multiple successive zk 
session expiries. Restarting the process using the zk client resolved the 
problems. 
Comparing these with a case in which a new stable zk session was created 
following a session expiry, the timeout during rebalance is not seen in the 
successful case.

This behavior was seen on 09/08 and 09/11 -- the attached 'full' logs show all 
logs entries minus entries particular to our application. For 09/08, the time 
span is 2015-09-08T12:52:06.069-04:00 to 2015-09-08T13:14:48.250-04:00; for 
11/08, the time span is between 2015-09-11T01:38:17.000-04:00 to 
2015-09-11T07:44:47.124-04:00. The digest logs are the result of retaining only 
error and warning entries, and entries containing any of: "begin rebalancing", 
"end rebalancing", "timed", and "zookeeper state". For the 09/11 digest logs, 
entries from the kafka.network.Processor logger are also excised for clarity.





> zk connection instability, perhaps precipitated by zk client timeout during 
> rebalance
> -
>
> Key: KAFKA-2572
> URL: https://issues.apache.org/jira/browse/KAFKA-2572
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.8.2.1
> Environment: zk version 3.4.6,
> CentOS 6, 2.6.32-504.1.3.el6.x86_64
>Reporter: John Firth
> Attachments: 090815-digest.log, 090815-full.log, 091115-digest.log, 
> 091115-full.log.zip
>
>
> On two occasions, have seen zk session expiry, followed by a timeout during a 
> consumer rebalance following this expiry, followed by multiple successive zk 
> session expiries. Restarting the process using the zk client resolved the 
> problems. 
> Comparing these with a case in which a new stable zk session was created 
> following a session expiry, the timeout during rebalance is not seen in the 
> successful case.
> This behavior was seen on 09/08 and 09/11 -- the attached 'full' logs show 
> all logs entries minus entries particular to our application. For 09/08, the 
> time span is 2015-09-08T12:52:06.069-04:00 to 2015-09-08T13:14:48.250-04:00; 
> for 11/08, the time span is between 2015-09-11T01:38:17.000-04:00 to 
> 2015-09-11T07:44:47.124-04:00. The digest logs are the result of retaining 
> only error and warning entries, and entries containing any of: "begin 
> rebalancing", "end rebalancing", "timed", and "zookeeper state". For the 
> 09/11 digest logs, entries from the kafka.network.Processor logger are also 
> excised for clarity. Unfortunately, debug logging was not enabled during 
> these events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2572) zk connection instability, perhaps precipitated by zk client timeout during rebalance

2015-09-22 Thread John Firth (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Firth updated KAFKA-2572:
--
Attachment: 091115-full.log.zip
091115-digest.log
090815-full.log
090815-digest.log

> zk connection instability, perhaps precipitated by zk client timeout during 
> rebalance
> -
>
> Key: KAFKA-2572
> URL: https://issues.apache.org/jira/browse/KAFKA-2572
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.8.2.1
> Environment: zk version 3.4.6,
> CentOS 6, 2.6.32-504.1.3.el6.x86_64
>Reporter: John Firth
> Attachments: 090815-digest.log, 090815-full.log, 091115-digest.log, 
> 091115-full.log.zip
>
>
> On two occasions, have seen zk session expiry, followed by a timeout during a 
> consumer rebalance following this expiry, followed by multiple successive zk 
> session expiries. Comparing these with a case in which a new stable zk 
> session was created following a session expiry, the timeout during rebalance 
> is not seen in the successful case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2572) zk connection instability, perhaps precipitated by zk client timeout during rebalance

2015-09-22 Thread John Firth (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Firth updated KAFKA-2572:
--
Attachment: (was: 091115-full.log.zip)

> zk connection instability, perhaps precipitated by zk client timeout during 
> rebalance
> -
>
> Key: KAFKA-2572
> URL: https://issues.apache.org/jira/browse/KAFKA-2572
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.8.2.1
> Environment: zk version 3.4.6,
> CentOS 6, 2.6.32-504.1.3.el6.x86_64
>Reporter: John Firth
> Attachments: 090815-digest.log, 090815-full.log, 091115-digest.log
>
>
> On two occasions, have seen zk session expiry, followed by a timeout during a 
> consumer rebalance following this expiry, followed by multiple successive zk 
> session expiries. Restarting the process using the zk client resolved the 
> problems. 
> Comparing these with a case in which a new stable zk session was created 
> following a session expiry, the timeout during rebalance is not seen in the 
> successful case.
> This behavior was seen on 09/08 and 09/11 -- the attached 'full' logs show 
> all logs entries minus entries particular to our application. For 09/08, the 
> time span is 2015-09-08T12:52:06.069-04:00 to 2015-09-08T13:14:48.250-04:00; 
> for 11/08, the time span is between 2015-09-11T01:38:17.000-04:00 to 
> 2015-09-11T07:44:47.124-04:00. The digest logs are the result of retaining 
> only error and warning entries, and entries containing any of: "begin 
> rebalancing", "end rebalancing", "timed", and "zookeeper state". For the 
> 09/11 digest logs, entries from the kafka.network.Processor logger are also 
> excised for clarity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2572) zk connection instability, perhaps precipitated by zk client timeout during rebalance

2015-09-22 Thread John Firth (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Firth updated KAFKA-2572:
--
Description: 
On two occasions, have seen zk session expiry, followed by a timeout during a 
consumer rebalance following this expiry, followed by multiple successive zk 
session expiries. Restarting the process using the zk client resolved the 
problems. 
Comparing these with a case in which a new stable zk session was created 
following a session expiry, the timeout during rebalance is not seen in the 
successful case.

This behavior was seen on 09/08 and 09/11 -- the attached 'full' logs show all 
logs entries minus entries particular to our application. For 09/08, the time 
span is 2015-09-08T12:52:06.069-04:00 to 2015-09-08T13:14:48.250-04:00; for 
11/08, the time span is between 2015-09-11T01:38:17.000-04:00 to 
2015-09-11T07:44:47.124-04:00. The digest logs are the result of retaining only 
error and warning entries, and entries containing any of: "begin rebalancing", 
"end rebalancing", "timed", and "zookeeper state". For the 09/11 digest logs, 
entries from the kafka.network.Processor logger are also excised for clarity.




  was:
On two occasions, have seen zk session expiry, followed by a timeout during a 
consumer rebalance following this expiry, followed by multiple successive zk 
session expiries. Comparing these with a case in which a new stable zk session 
was created following a session expiry, the timeout during rebalance is not 
seen in the successful case.



> zk connection instability, perhaps precipitated by zk client timeout during 
> rebalance
> -
>
> Key: KAFKA-2572
> URL: https://issues.apache.org/jira/browse/KAFKA-2572
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.8.2.1
> Environment: zk version 3.4.6,
> CentOS 6, 2.6.32-504.1.3.el6.x86_64
>Reporter: John Firth
> Attachments: 090815-digest.log, 090815-full.log, 091115-digest.log
>
>
> On two occasions, have seen zk session expiry, followed by a timeout during a 
> consumer rebalance following this expiry, followed by multiple successive zk 
> session expiries. Restarting the process using the zk client resolved the 
> problems. 
> Comparing these with a case in which a new stable zk session was created 
> following a session expiry, the timeout during rebalance is not seen in the 
> successful case.
> This behavior was seen on 09/08 and 09/11 -- the attached 'full' logs show 
> all logs entries minus entries particular to our application. For 09/08, the 
> time span is 2015-09-08T12:52:06.069-04:00 to 2015-09-08T13:14:48.250-04:00; 
> for 11/08, the time span is between 2015-09-11T01:38:17.000-04:00 to 
> 2015-09-11T07:44:47.124-04:00. The digest logs are the result of retaining 
> only error and warning entries, and entries containing any of: "begin 
> rebalancing", "end rebalancing", "timed", and "zookeeper state". For the 
> 09/11 digest logs, entries from the kafka.network.Processor logger are also 
> excised for clarity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2572) zk connection instability, perhaps precipitated by zk client timeout during rebalance

2015-09-22 Thread John Firth (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Firth updated KAFKA-2572:
--
Description: 
On two occasions, have seen zk session expiry, followed by a timeout during a 
consumer rebalance following this expiry, followed by multiple successive zk 
session expiries. Restarting the process using the zk client resolved the 
problems. 
Comparing these with a case in which a new stable zk session was created 
following a session expiry, the timeout during rebalance is not seen in the 
successful case.

This behavior was seen on 09/08 and 09/11 -- the attached 'full' logs show all 
logs entries minus entries particular to our application. For 09/08, the time 
span is 2015-09-08T12:52:06.069-04:00 to 2015-09-08T13:14:48.250-04:00; for 
11/08, the time span is between 2015-09-11T01:38:17.000-04:00 to 
2015-09-11T07:44:47.124-04:00. The digest logs are the result of retaining only 
error and warning entries, and entries containing any of: "begin rebalancing", 
"end rebalancing", "timed", and "zookeeper state". For the 09/11 digest logs, 
entries from the kafka.network.Processor logger are also excised for clarity. 
Unfortunately, debug logging was not enabled during these events.

The 09/08 case is a little more straightforward than the 09/11 case. In the 
09/08 case, a session times out at 2015-09-08T12:52:06.069-04:00; two timeouts 
for the same session are then seen during the rebalance that follows the 
establishment of that session, at 2015-09-08T12:52:19.107-04:00 and 
2015-09-08T12:52:31.639-04:00. The rebalance begins at 
2015-09-08T12:52:06.667-04:00. The connection to ZK then expires and is 
restablished multiple times before the process is killed after 
2015-09-08T13:13:41.655-04:00, which marks the last entry in the digest.

The 09/11 case shows repeated cycles of session expiry, followed by rebalancing 
activity, followed by a pause during which nothing is heard from the zk server, 
followed by a session timeout.

  was:
On two occasions, have seen zk session expiry, followed by a timeout during a 
consumer rebalance following this expiry, followed by multiple successive zk 
session expiries. Restarting the process using the zk client resolved the 
problems. 
Comparing these with a case in which a new stable zk session was created 
following a session expiry, the timeout during rebalance is not seen in the 
successful case.

This behavior was seen on 09/08 and 09/11 -- the attached 'full' logs show all 
logs entries minus entries particular to our application. For 09/08, the time 
span is 2015-09-08T12:52:06.069-04:00 to 2015-09-08T13:14:48.250-04:00; for 
11/08, the time span is between 2015-09-11T01:38:17.000-04:00 to 
2015-09-11T07:44:47.124-04:00. The digest logs are the result of retaining only 
error and warning entries, and entries containing any of: "begin rebalancing", 
"end rebalancing", "timed", and "zookeeper state". For the 09/11 digest logs, 
entries from the kafka.network.Processor logger are also excised for clarity. 
Unfortunately, debug logging was not enabled during these events.





> zk connection instability, perhaps precipitated by zk client timeout during 
> rebalance
> -
>
> Key: KAFKA-2572
> URL: https://issues.apache.org/jira/browse/KAFKA-2572
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.8.2.1
> Environment: zk version 3.4.6,
> CentOS 6, 2.6.32-504.1.3.el6.x86_64
>Reporter: John Firth
> Attachments: 090815-digest.log, 090815-full.log, 091115-digest.log, 
> 091115-full.log.zip
>
>
> On two occasions, have seen zk session expiry, followed by a timeout during a 
> consumer rebalance following this expiry, followed by multiple successive zk 
> session expiries. Restarting the process using the zk client resolved the 
> problems. 
> Comparing these with a case in which a new stable zk session was created 
> following a session expiry, the timeout during rebalance is not seen in the 
> successful case.
> This behavior was seen on 09/08 and 09/11 -- the attached 'full' logs show 
> all logs entries minus entries particular to our application. For 09/08, the 
> time span is 2015-09-08T12:52:06.069-04:00 to 2015-09-08T13:14:48.250-04:00; 
> for 11/08, the time span is between 2015-09-11T01:38:17.000-04:00 to 
> 2015-09-11T07:44:47.124-04:00. The digest logs are the result of retaining 
> only error and warning entries, and entries containing any of: "begin 
> rebalancing", "end rebalancing", "timed", and "zookeeper state". For the 
> 09/11 digest logs, entries from the kafka.network.Processor logger are also 
> excised for clarity. Unfortunately, debug logging was not enabled during 
> these events.
> The 09/08 case is a little more straightforward than the