[jira] [Commented] (KAFKA-2561) Optionally support OpenSSL for SSL/TLS
[ https://issues.apache.org/jira/browse/KAFKA-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325333#comment-16325333 ] Prasanna Gautam commented on KAFKA-2561: [~ijuma] is this still being planned anytime soon? I'd like to check if Java 9 or using Open/Boring/LibreSSL have any meaningful performance improvments for SSL. > Optionally support OpenSSL for SSL/TLS > --- > > Key: KAFKA-2561 > URL: https://issues.apache.org/jira/browse/KAFKA-2561 > Project: Kafka > Issue Type: New Feature > Components: security >Affects Versions: 0.9.0.0 >Reporter: Ismael Juma > > JDK's `SSLEngine` is unfortunately a bit slow (KAFKA-2431 covers this in more > detail). We should consider supporting OpenSSL for SSL/TLS. Initial > experiments on my laptop show that it performs a lot better: > {code} > start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, > nMsg.sec, config > 2015-09-21 14:41:58:245, 2015-09-21 14:47:02:583, 28610.2295, 94.0081, > 3000, 98574.6111, Java 8u60/server auth JDK > SSLEngine/TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA > 2015-09-21 14:38:24:526, 2015-09-21 14:40:19:941, 28610.2295, 247.8900, > 3000, 259931.5514, Java 8u60/server auth > OpenSslEngine/TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 > 2015-09-21 14:49:03:062, 2015-09-21 14:50:27:764, 28610.2295, 337.7751, > 3000, 354182.9000, Java 8u60/plaintext > {code} > Extracting the throughput figures: > * JDK SSLEngine: 94 MB/s > * OpenSSL SSLEngine: 247 MB/s > * Plaintext: 337 MB/s (code from trunk, so no zero-copy due to KAFKA-2517) > In order to get these figures, I used Netty's `OpenSslEngine` by hacking > `SSLFactory` to use Netty's `SslContextBuilder` and made a few changes to > `SSLTransportLayer` in order to workaround differences in behaviour between > `OpenSslEngine` and JDK's SSLEngine (filed > https://github.com/netty/netty/issues/4235 and > https://github.com/netty/netty/issues/4238 upstream). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6082) consider fencing zookeeper updates with controller epoch zkVersion
[ https://issues.apache.org/jira/browse/KAFKA-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294391#comment-16294391 ] Prasanna Gautam commented on KAFKA-6082: [~onurkaraman] Does the require fencing all ZK updates from controller and brokers, or some subset of changes? > consider fencing zookeeper updates with controller epoch zkVersion > -- > > Key: KAFKA-6082 > URL: https://issues.apache.org/jira/browse/KAFKA-6082 > Project: Kafka > Issue Type: Sub-task >Reporter: Onur Karaman > > If we want, we can use multi-op to fence zookeeper updates with the > controller epoch's zkVersion. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6065) Add zookeeper metrics to ZookeeperClient as in KIP-188
[ https://issues.apache.org/jira/browse/KAFKA-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265758#comment-16265758 ] Prasanna Gautam commented on KAFKA-6065: Not at all. Thanks. > Add zookeeper metrics to ZookeeperClient as in KIP-188 > -- > > Key: KAFKA-6065 > URL: https://issues.apache.org/jira/browse/KAFKA-6065 > Project: Kafka > Issue Type: Sub-task >Reporter: Onur Karaman >Assignee: Prasanna Gautam > Fix For: 1.1.0 > > > Among other things, KIP-188 added latency metrics to ZkUtils. We should add > the same metrics to ZookeeperClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252192#comment-16252192 ] Prasanna Gautam commented on KAFKA-5473: [~junrao] [~ijuma] I have updated the PR without the config. I don't quite follow what you mean by adding in kafkaController.newSession() because I can't find it in KafkaController anymore. I'm currently running the startup function if the state callback stops returning SessionEstablishmentError and if it's not already in a startingUp state. Did you mean a different way to run it? Also, I have been getting ducktape errors on TravisCI for the tests. I'd assume this requires some ducktape tests too? > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.1.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247927#comment-16247927 ] Prasanna Gautam commented on KAFKA-5473: [~junrao] so looks like I can get it done this weekend. I think the KafkaHealth just needs to log errors and the reconnects can all be handled in the New ZKClient. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.1.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240979#comment-16240979 ] Prasanna Gautam commented on KAFKA-5473: Ok, yeah. I ran into same/related issue this weekend on a 12-node cluster, so I can continue on this. Are you still planning to expose a metric that the zookeeper node is in reconnecting state? > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.1.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-6065) Add zookeeper metrics to ZookeeperClient as in KIP-188
[ https://issues.apache.org/jira/browse/KAFKA-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Gautam reassigned KAFKA-6065: -- Assignee: Prasanna Gautam > Add zookeeper metrics to ZookeeperClient as in KIP-188 > -- > > Key: KAFKA-6065 > URL: https://issues.apache.org/jira/browse/KAFKA-6065 > Project: Kafka > Issue Type: Sub-task >Reporter: Onur Karaman >Assignee: Prasanna Gautam > Fix For: 1.1.0 > > > Among other things, KIP-188 added latency metrics to ZkUtils. We should add > the same metrics to ZookeeperClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223232#comment-16223232 ] Prasanna Gautam edited comment on KAFKA-5473 at 10/28/17 4:43 AM: -- Thanks [~junrao], looks like a good start. -Is there plan for a way to get notified via a metric or state change when kafka gets in this state? I think it would be useful to know how often the cluster is getting in that state and trigger alerts. - I missed the ZKSessionState being set to RECONNECTING on first read. was (Author: prasincs): Thanks [~junrao], looks like a good start.- Is there plan for a way to get notified via a metric or state change when kafka gets in this state? I think it would be useful to know how often the cluster is getting in that state and trigger alerts. - I missed the ZKSessionState being set to RECONNECTING on first read. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.1.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223232#comment-16223232 ] Prasanna Gautam edited comment on KAFKA-5473 at 10/28/17 4:43 AM: -- Thanks [~junrao], looks like a good start. was (Author: prasincs): Thanks [~junrao], looks like a good start. -Is there plan for a way to get notified via a metric or state change when kafka gets in this state? I think it would be useful to know how often the cluster is getting in that state and trigger alerts. - I missed the ZKSessionState being set to RECONNECTING on first read. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.1.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223232#comment-16223232 ] Prasanna Gautam edited comment on KAFKA-5473 at 10/28/17 4:42 AM: -- Thanks [~junrao], looks like a good start.- Is there plan for a way to get notified via a metric or state change when kafka gets in this state? I think it would be useful to know how often the cluster is getting in that state and trigger alerts. - I missed the ZKSessionState being set to RECONNECTING on first read. was (Author: prasincs): Thanks [~junrao], looks like a good start. Is there plan for a way to get notified via a metric or state change when kafka gets in this state? I think it would be useful to know how often the cluster is getting in that state and trigger alerts. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.1.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223232#comment-16223232 ] Prasanna Gautam commented on KAFKA-5473: Thanks [~junrao], looks like a good start. Is there plan for a way to get notified via a metric or state change when kafka gets in this state? I think it would be useful to know how often the cluster is getting in that state and trigger alerts. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.1.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185434#comment-16185434 ] Prasanna Gautam commented on KAFKA-5473: [~ijuma] I added a new configuration that's consistent with [~junrao] was mentioning previously. I have added zookeeper.connection.retry.timeout.ms to set an upper bound on how long to wait before killing the connection and triggering the shutdown. This is looking like a bigger structure change than I'd originally anticipated. I want to make sure I'm on right track. Since ZkUtils is initialized and needs to be closed/reconnected in ZKServer object, does it make sense to pass state of connection to the KafkaServer so that timeout can be guaranteed and the services cleanly shut down. This is different than other examples in the codebase where ZK is used to share state, but since this involves ZK not being available, etc, we need a different mechanism to inform KafkaServer that it needs to start reconnect, then use the ZKUtils instance thereafter. if the reconnect retry timeout has reached, then start shutdown process. The IZkStateListener is used in multiple places in code, and I think it's easier to make another class like ZKSessionTimeoutRecovery that only handles reconnects, and clean exit if that fails. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.0.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185144#comment-16185144 ] Prasanna Gautam commented on KAFKA-5473: Yeah, I'm OK if someone can pick it up too. I'm aiming for sometime around tomorrow or over the weekend for a PR. Will be happy to review, test and help in any way I can. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.0.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184653#comment-16184653 ] Prasanna Gautam commented on KAFKA-5473: Think I can make it to the code freeze. I'm at a conference this week and it's a bit of hassle to get the env well setup on the machine I have here. Is there an easy way to bootstrap the environment for testing? I'd like to reuse anything that's already been done for that. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.0.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184314#comment-16184314 ] Prasanna Gautam commented on KAFKA-5473: [~ijuma] Yes I intend to send a PR for this. I need to resume this and test. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > Fix For: 1.0.1 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5628) Kafka Startup fails on corrupted index files
[ https://issues.apache.org/jira/browse/KAFKA-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Gautam updated KAFKA-5628: --- Priority: Minor (was: Major) Description: One of our kafka brokers shut down after a load test and while there are some corrupted index files , the broker is failing to start with a unsafe memory access error {code:java} [2017-07-23 15:52:32,019] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:53) at org.apache.kafka.common.utils.Utils.readFully(Utils.java:854) at org.apache.kafka.common.utils.Utils.readFullyOrFail(Utils.java:827) at org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.loadRecord(FileLogInputStream.java:136) at org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.record(FileLogInputStream.java:149) at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:225) at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:224) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.log.LogSegment.recover(LogSegment.scala:224) at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:231) at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at kafka.log.Log.loadSegments(Log.scala:188) at kafka.log.Log.(Log.scala:116) at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:157) at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} This doesn't seem to be same as https://issues.apache.org/jira/browse/KAFKA-1554 because these topics are actively in use and the other empty indices are recovered fine.. It seems the machine had died because the disk was full. It seems to have resolved after the disk issue. Should kafka just check disk at startup and refuse to continue starting up? was: One of our kafka brokers shut down after a load test and while there are some corrupted index files , the broker is failing to start with a unsafe memory access error {code:java} [2017-07-23 15:52:32,019] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:53) at org.apache.kafka.common.utils.Utils.readFully(Utils.java:854) at org.apache.kafka.common.utils.Utils.readFullyOrFail(Utils.java:827) at org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.loadRecord(FileLogInputStream.java:136) at org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.record(FileLogInputStream.java:149) at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:225) at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:224) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.log.LogSegment.recover(LogSegment.scala:224) at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:231) at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at
[jira] [Assigned] (KAFKA-5628) Kafka Startup fails on corrupted index files
[ https://issues.apache.org/jira/browse/KAFKA-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Gautam reassigned KAFKA-5628: -- Assignee: Jun Rao Affects Version/s: 0.10.2.0 Environment: Ubuntu 14.04, Java 8(1.8.0_65) Description: One of our kafka brokers shut down after a load test and while there are some corrupted index files , the broker is failing to start with a unsafe memory access error {code:java} [2017-07-23 15:52:32,019] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:53) at org.apache.kafka.common.utils.Utils.readFully(Utils.java:854) at org.apache.kafka.common.utils.Utils.readFullyOrFail(Utils.java:827) at org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.loadRecord(FileLogInputStream.java:136) at org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.record(FileLogInputStream.java:149) at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:225) at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:224) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.log.LogSegment.recover(LogSegment.scala:224) at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:231) at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at kafka.log.Log.loadSegments(Log.scala:188) at kafka.log.Log.(Log.scala:116) at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:157) at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} This doesn't seem to be same as https://issues.apache.org/jira/browse/KAFKA-1554 because these topics are actively in use and the other empty indices are recovered fine.. It seems the machine had died because the disk was full. > Kafka Startup fails on corrupted index files > > > Key: KAFKA-5628 > URL: https://issues.apache.org/jira/browse/KAFKA-5628 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0 > Environment: Ubuntu 14.04, Java 8(1.8.0_65) >Reporter: Prasanna Gautam >Assignee: Jun Rao > > One of our kafka brokers shut down after a load test and while there are some > corrupted index files , the broker is failing to start with a unsafe memory > access error > {code:java} > [2017-07-23 15:52:32,019] FATAL Fatal error during KafkaServerStartable > startup. Prepare to shutdown (kafka.server.KafkaServerStartable) > java.lang.InternalError: a fault occurred in a recent unsafe memory access > operation in compiled Java code > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:53) > at org.apache.kafka.common.utils.Utils.readFully(Utils.java:854) > at org.apache.kafka.common.utils.Utils.readFullyOrFail(Utils.java:827) > at > org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.loadRecord(FileLogInputStream.java:136) > at > org.apache.kafka.common.record.FileLogInputStream$FileChannelLogEntry.record(FileLogInputStream.java:149) > at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:225) > at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:224) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at
[jira] [Created] (KAFKA-5628) Kafka Startup fails on corrupted index files
Prasanna Gautam created KAFKA-5628: -- Summary: Kafka Startup fails on corrupted index files Key: KAFKA-5628 URL: https://issues.apache.org/jira/browse/KAFKA-5628 Project: Kafka Issue Type: Bug Reporter: Prasanna Gautam -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061610#comment-16061610 ] Prasanna Gautam commented on KAFKA-5473: [~junrao] Why not do an exponential backoff (with jitter) with an upper bound? If you're temporarily disconnected, it should recover within a few seconds, otherwise an upper bound before the broker dies feels like a more sensible solution. This way ZK nodes being network-partitioned from kafka wouldn't immediately bring down all brokers if its a recoverable issue. Also, if there's a cleaner way to exit that allows all writes to be synced to disk, that seems more preferable than System.exit() too. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Gautam reassigned KAFKA-5473: -- Assignee: Prasanna Gautam > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao >Assignee: Prasanna Gautam > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established
[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060243#comment-16060243 ] Prasanna Gautam commented on KAFKA-5473: I don't think failing the Broker immediately is the right solution, even though that's what we're effectively doing. I think it should be automatically handled and if the state cannot be handled, then the broker should fail. [~junrao] If there are no plans for assigning to someone in near future, I can take a stab at it. > handle ZK session expiration properly when a new session can't be established > - > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.9.0.0 >Reporter: Jun Rao > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)