[jira] [Created] (KAFKA-6417) plugin.path pointing at a plugin directory causes ClassNotFoundException
Dustin Cote created KAFKA-6417: -- Summary: plugin.path pointing at a plugin directory causes ClassNotFoundException Key: KAFKA-6417 URL: https://issues.apache.org/jira/browse/KAFKA-6417 Project: Kafka Issue Type: Bug Components: KafkaConnect Affects Versions: 1.0.0 Reporter: Dustin Cote When using the {{plugin.path}} configuration for the Connect workers, the user is expected to specify a list containing the following per the docs: {quote} The list should consist of top level directories that include any combination of: a) directories immediately containing jars with plugins and their dependencies b) uber-jars with plugins and their dependencies c) directories immediately containing the package directory structure of classes of plugins and their dependencies {quote} This means we would expect {{plugin.path=/usr/share/plugins}} for a structure like {{/usr/share/plugins/myplugin1}},{{/usr/share/plugins/myplugin2}}, etc. However if you specify {{plugin.path=/usr/share/plugins/myplugin1}} the resulting behavior is that dependencies for {{myplugin1}} are not properly loaded. This causes a {{ClassNotFoundException}} that is not intuitive to debug. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote resolved KAFKA-2394. Resolution: Won't Do > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Dustin Cote >Priority: Minor > Labels: newbie > Fix For: 1.1.0 > > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6107) SCRAM user add fails if Kafka has never been started
Dustin Cote created KAFKA-6107: -- Summary: SCRAM user add fails if Kafka has never been started Key: KAFKA-6107 URL: https://issues.apache.org/jira/browse/KAFKA-6107 Project: Kafka Issue Type: Bug Components: tools, zkclient Affects Versions: 0.11.0.0 Reporter: Dustin Cote Priority: Minor When trying to add a SCRAM user in ZooKeeper without having ever starting Kafka, the kafka-configs tool does not handle it well. This is a common use case because starting a new cluster where you want SCRAM for inter broker communication would generally result in seeing this problem. Today, the workaround is to start Kafka, add the user, then restart Kafka. Here's how to reproduce: 1) Start ZooKeeper 2) Run {code} bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'SCRAM-SHA-256=[iterations=8192,password=broker_pwd],SCRAM-SHA-512=[password=broker_pwd]' --entity-type users --entity-name broker {code} This will result in: {code} bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'SCRAM-SHA-256=[iterations=8192,password=broker_pwd],SCRAM-SHA-512=[password=broker_pwd]' --entity-type users --entity-name broker Error while executing config command org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /config/changes/config_change_ org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /config/changes/config_change_ at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1001) at org.I0Itec.zkclient.ZkClient.create(ZkClient.java:528) at org.I0Itec.zkclient.ZkClient.createPersistentSequential(ZkClient.java:444) at kafka.utils.ZkPath.createPersistentSequential(ZkUtils.scala:1045) at kafka.utils.ZkUtils.createSequentialPersistentPath(ZkUtils.scala:527) at kafka.admin.AdminUtils$.kafka$admin$AdminUtils$$changeEntityConfig(AdminUtils.scala:600) at kafka.admin.AdminUtils$.changeUserOrUserClientIdConfig(AdminUtils.scala:551) at kafka.admin.AdminUtilities$class.changeConfigs(AdminUtils.scala:63) at kafka.admin.AdminUtils$.changeConfigs(AdminUtils.scala:72) at kafka.admin.ConfigCommand$.alterConfig(ConfigCommand.scala:101) at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:68) at kafka.admin.ConfigCommand.main(ConfigCommand.scala) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /config/changes/config_change_ at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.I0Itec.zkclient.ZkConnection.create(ZkConnection.java:100) at org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:531) at org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:528) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:991) ... 11 more {code} The command doesn't appear to fail but it does throw an exception and return an error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5994) Improve transparency of broker user ACL misconfigurations
Dustin Cote created KAFKA-5994: -- Summary: Improve transparency of broker user ACL misconfigurations Key: KAFKA-5994 URL: https://issues.apache.org/jira/browse/KAFKA-5994 Project: Kafka Issue Type: Improvement Components: security Affects Versions: 0.10.2.1 Reporter: Dustin Cote When the user for inter broker communication is not a super user and ACLs are configured with allow.everyone.if.no.acl.found=false, the cluster will not serve data. This is extremely confusing to debug because there is no security negotiation problem or indication of an error other than no data can make it in or out of the broker. If one knew to look in the authorizer log, it would be more clear, but that didn't make it into my workflow at least. Here's an example of a problematic debugging scenario SASL_SSL, SSL, SASL_PLAINTEXT ports on the brokers SASL user specified in `super.users` SSL specified as the inter broker protocol The only way I could figure out ACLs were an issue without gleaning it through configuration inspection was that controlled shutdown indicated that a cluster action had failed. It would be good if we could be more transparent about the failure here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5885) NPE in ZKClient
Dustin Cote created KAFKA-5885: -- Summary: NPE in ZKClient Key: KAFKA-5885 URL: https://issues.apache.org/jira/browse/KAFKA-5885 Project: Kafka Issue Type: Bug Components: zkclient Affects Versions: 0.10.2.1 Reporter: Dustin Cote A null znode for a topic (reason how this happen isn't totally clear, but not the focus of this issue) can currently cause controller leader election to fail. When looking at the broker logging, you can see there is a NullPointerException emanating from the ZKClient: {code} [2017-09-11 00:00:21,441] ERROR Error while electing or becoming leader on broker 1010674 (kafka.server.ZookeeperLeaderElector) kafka.common.KafkaException: Can't parse json string: null at kafka.utils.Json$.liftedTree1$1(Json.scala:40) at kafka.utils.Json$.parseFull(Json.scala:36) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$1.apply(ZkUtils.scala:704) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$1.apply(ZkUtils.scala:700) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.utils.ZkUtils.getReplicaAssignmentForTopics(ZkUtils.scala:700) at kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:742) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:333) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:160) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:85) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:154) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:154) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:154) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:153) at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:825) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:72) Caused by: java.lang.NullPointerException {code} Regardless of how a null topic znode ended up in ZooKeeper, we can probably handle this better, at least by printing the path up to the problematic znode in the log. The way this particular problem was resolved was by using the ``kafka-topics`` command and seeing it persistently fail trying to read a particular topic with this same message. Then deleting the null znode allowed the leader election to complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5718) Better document what LogAppendTime means
Dustin Cote created KAFKA-5718: -- Summary: Better document what LogAppendTime means Key: KAFKA-5718 URL: https://issues.apache.org/jira/browse/KAFKA-5718 Project: Kafka Issue Type: Improvement Components: documentation Affects Versions: 0.11.0.0 Reporter: Dustin Cote Priority: Trivial There isn't a good description of LogAppendTime in the documentation. It would be nice to add this in somewhere to say something like: LogAppendTime is some time between when the partition leader receives the request and before it writes it to it's local log. There are two important distinctions that trip people up: 1) This timestamp is not when the consumer could have first consumed the message. This instead requires min.insync.replicas to have been satisfied. 2) This is not precisely when the leader wrote to it's log, there can be delays along the path between receiving the request and writing to the log. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5688) Add a modifier to the REST endpoint to only show errors
Dustin Cote created KAFKA-5688: -- Summary: Add a modifier to the REST endpoint to only show errors Key: KAFKA-5688 URL: https://issues.apache.org/jira/browse/KAFKA-5688 Project: Kafka Issue Type: Improvement Components: KafkaConnect Affects Versions: 0.11.0.0 Reporter: Dustin Cote Priority: Minor Today the REST endpoint for workers to validate configuration is pretty hard to read. It would be nice if we had a modifier for the /validate endpoint that only showed configuration errors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5675) Possible worker_id duplication in Connect
Dustin Cote created KAFKA-5675: -- Summary: Possible worker_id duplication in Connect Key: KAFKA-5675 URL: https://issues.apache.org/jira/browse/KAFKA-5675 Project: Kafka Issue Type: Bug Components: KafkaConnect Affects Versions: 0.10.2.1 Reporter: Dustin Cote Priority: Minor It's possible to set non-unique host/port combinations for workers via *rest.advertised.host.name* and *rest.advertised.host.port* (e.g. localhost:8083). While this isn't typically advisable, it can result in weird behavior for containerized deployments where localhost might end up being mapped to something that is externally facing. The worker_id today appears to be set as this host/port combination so you end up with duplicate worker_ids causing long rebalances presumably because task assignment gets confused. It would be good to either change how the worker_id is generated or find a way to not let a worker start if a worker with an identical worker_id already exists. In the short term, we should document the requirement of unique advertised host/port combinations for workers to avoid debugging a somewhat tricky scenario. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-3056) MirrorMaker with new consumer doesn't handle CommitFailedException
[ https://issues.apache.org/jira/browse/KAFKA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote resolved KAFKA-3056. Resolution: Duplicate > MirrorMaker with new consumer doesn't handle CommitFailedException > -- > > Key: KAFKA-3056 > URL: https://issues.apache.org/jira/browse/KAFKA-3056 > Project: Kafka > Issue Type: Bug >Reporter: Gwen Shapira > > MirrorMaker currently doesn't handle CommitFailedException when trying to > commit (it only handles WakeupException). > I didn't test, but it appears that if MirrorMaker tries to commit while > rebalancing is in progress, it will result in thread failure. Probably not > what we want. I think the right thing is to log a message and call poll() > again to trigger a re-join. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5327) Console Consumer should only poll for up to max messages
Dustin Cote created KAFKA-5327: -- Summary: Console Consumer should only poll for up to max messages Key: KAFKA-5327 URL: https://issues.apache.org/jira/browse/KAFKA-5327 Project: Kafka Issue Type: Improvement Components: tools Reporter: Dustin Cote Priority: Minor The ConsoleConsumer has a --max-messages flag that can be used to limit the number of messages consumed. However, the number of records actually consumed is governed by max.poll.records. This means you see one message on the console, but your offset has moved forward a default of 500, which is kind of counterintuitive. It would be good to only commit offsets for messages we have printed to the console. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5225) StreamsResetter doesn't allow custom Consumer properties
Dustin Cote created KAFKA-5225: -- Summary: StreamsResetter doesn't allow custom Consumer properties Key: KAFKA-5225 URL: https://issues.apache.org/jira/browse/KAFKA-5225 Project: Kafka Issue Type: Bug Components: streams, tools Affects Versions: 0.10.2.1 Reporter: Dustin Cote The StreamsResetter doesn't let the user pass in any configurations to the embedded consumer. This is a problem in secured environments because you can't configure the embedded consumer to talk to the cluster. The tool should take an approach similar to `kafka.admin.ConsumerGroupCommand` which allows a config file to be passed in the command line for such operations. cc [~mjsax] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5148) Add configurable compression block size to the broker
Dustin Cote created KAFKA-5148: -- Summary: Add configurable compression block size to the broker Key: KAFKA-5148 URL: https://issues.apache.org/jira/browse/KAFKA-5148 Project: Kafka Issue Type: Improvement Components: compression Reporter: Dustin Cote Fix For: 0.10.2.0 Similar to the discussion in KAFKA-3704, we should consider a configurable compression block size on the broker side. This especially considering the change in block size from 32KB to 1KB in the 0.10 release. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5138) MirrorMaker doesn't exit on send failure occasionally
Dustin Cote created KAFKA-5138: -- Summary: MirrorMaker doesn't exit on send failure occasionally Key: KAFKA-5138 URL: https://issues.apache.org/jira/browse/KAFKA-5138 Project: Kafka Issue Type: Bug Affects Versions: 0.10.2.0 Reporter: Dustin Cote MirrorMaker with abort.on.send.failure=true does not always exit if the producer closes. Here is the logic that happens: First we encounter a problem producing and force the producer to close {code} [2017-04-10 07:17:25,137] ERROR Error when sending message to topic mytopicwith key: 20 bytes, value: 314 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Expiring 47 record(s) for mytopic-2: 30879 ms has passed since last append [2017-04-10 07:17:25,170] INFO Closing producer due to send failure. (kafka.tools.MirrorMaker$) [2017-04-10 07:17:25,170] INFO Closing the Kafka producer with timeoutMillis = 0 ms. (org.apache.kafka.clients.producer.KafkaProducer) [2017-04-10 07:17:25,170] INFO Proceeding to force close the producer since pending requests could not be completed within timeout 0 ms. (org.apache.kafka.clients.producer.KafkaProducer) [2017-04-10 07:17:25,170] DEBUG The Kafka producer has closed. (org.apache.kafka.clients.producer.KafkaProducer) [2017-04-10 07:17:25,170] ERROR Error when sending message to topic mytopic with key: 20 bytes, value: 313 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Expiring 47 record(s) for mytopic-2: 30879 ms has passed since last append [2017-04-10 07:17:25,170] INFO Closing producer due to send failure. (kafka.tools.MirrorMaker$) [2017-04-10 07:17:25,170] INFO Closing the Kafka producer with timeoutMillis = 0 ms. (org.apache.kafka.clients.producer.KafkaProducer) [2017-04-10 07:17:25,170] INFO Proceeding to force close the producer since pending requests could not be completed within timeout 0 ms. (org.apache.kafka.clients.producer.KafkaProducer) [2017-04-10 07:17:25,170] DEBUG The Kafka producer has closed. (org.apache.kafka.clients.producer.KafkaProducer) {code} All good there. Then we can't seem to close the producer nicely after about 15 seconds and so it is forcefully killed: {code} [2017-04-10 07:17:39,778] ERROR Error when sending message to topic mytopic.subscriptions with key: 70 bytes, value: null with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) java.lang.IllegalStateException: Producer is closed forcefully. at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:522) at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:502) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:147) at java.lang.Thread.run(Unknown Source) [2017-04-10 07:17:39,778] INFO Closing producer due to send failure. (kafka.tools.MirrorMaker$) [2017-04-10 07:17:39,778] INFO Closing the Kafka producer with timeoutMillis = 0 ms. (org.apache.kafka.clients.producer.KafkaProducer) [2017-04-10 07:17:39,778] INFO Proceeding to force close the producer since pending requests could not be completed within timeout 0 ms. (org.apache.kafka.clients.producer.KafkaProducer) [2017-04-10 07:17:39,778] DEBUG The Kafka producer has closed. (org.apache.kafka.clients.producer.KafkaProducer) [2017-04-10 07:17:39,779] DEBUG Removed sensor with name connections-closed: (org.apache.kafka.common.metrics.Metrics) {code} After removing some metric sensors for awhile this happens: {code} [2017-04-10 07:17:39,780] DEBUG Removed sensor with name node-3.latency (org.apache.kafka.common.metrics.Metrics) [2017-04-10 07:17:39,780] DEBUG Shutdown of Kafka producer I/O thread has completed. (org.apache.kafka.clients.producer.internals.Sender) [2017-04-10 07:17:41,852] DEBUG Sending Heartbeat request for group mirror-maker-1491619052-teab1-1 to coordinator myhost1:9092 (id: 2147483643 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2017-04-10 07:17:41,953] DEBUG Received successful Heartbeat response for group mirror-maker-1491619052-teab1-1 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2017-04-10 07:17:44,875] DEBUG Sending Heartbeat request for group mirror-maker-1491619052-teab1-1 to coordinator myhost1:9092 (id: 2147483643 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) {code} This heartbeating goes one for some time until: {code} [2017-04-10 07:19:57,392] DEBUG Received successful Heartbeat response for group mirror-maker-1491619052-teab1-1 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2017-04-10 07:19:57,994] DEBUG Connection with myhost1/123.123.321.321 disconnected
[jira] [Created] (KAFKA-5137) Controlled shutdown timeout message improvement
Dustin Cote created KAFKA-5137: -- Summary: Controlled shutdown timeout message improvement Key: KAFKA-5137 URL: https://issues.apache.org/jira/browse/KAFKA-5137 Project: Kafka Issue Type: Improvement Affects Versions: 0.10.2.0 Reporter: Dustin Cote Priority: Minor Currently if you fail during controlled shutdown, you can get a message that says the socket.timeout.ms has expired. This config actually doesn't exist on the broker. Instead, we should explicitly say if we've hit the controller.socket.timeout.ms or the request.timeout.ms as it's confusing to take action given the current message. I believe the relevant code is here: https://github.com/apache/kafka/blob/0.10.2/core/src/main/scala/kafka/server/KafkaServer.scala#L428-L454 I'm also not sure if there's another timeout that could be hit here or another reason why IOException might be thrown. In the least we should call out those two configs instead of the non-existent one but if we can direct to the proper one that would be even better. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs
[ https://issues.apache.org/jira/browse/KAFKA-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982151#comment-15982151 ] Dustin Cote commented on KAFKA-5118: [~huxi] ^ > Improve message for Kafka failed startup with non-Kafka data in data.dirs > - > > Key: KAFKA-5118 > URL: https://issues.apache.org/jira/browse/KAFKA-5118 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.2.0 >Reporter: Dustin Cote >Priority: Minor > > Today, if you try to startup a broker with some non-Kafka data in the > data.dirs you end up with a cryptic message: > {code} > [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads > during logs loading: java.lang.StringIndexOutOfBoundsException: String index > out of range: -1 (kafka.log.LogManager) > [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during > KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > {code} > It'd be better if we could tell the user to look for non-Kafka data in the > data.dirs and print out the offending directory that caused the problem in > the first place. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs
[ https://issues.apache.org/jira/browse/KAFKA-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982148#comment-15982148 ] Dustin Cote commented on KAFKA-5118: @huxi yes this is where this one came from. Directories that ended in -delete presumably from topics that were deleted. I hadn't checked if it can happen with other scenarios. Just thought the error was pretty cryptic. > Improve message for Kafka failed startup with non-Kafka data in data.dirs > - > > Key: KAFKA-5118 > URL: https://issues.apache.org/jira/browse/KAFKA-5118 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.2.0 >Reporter: Dustin Cote >Priority: Minor > > Today, if you try to startup a broker with some non-Kafka data in the > data.dirs you end up with a cryptic message: > {code} > [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads > during logs loading: java.lang.StringIndexOutOfBoundsException: String index > out of range: -1 (kafka.log.LogManager) > [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during > KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > {code} > It'd be better if we could tell the user to look for non-Kafka data in the > data.dirs and print out the offending directory that caused the problem in > the first place. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs
Dustin Cote created KAFKA-5118: -- Summary: Improve message for Kafka failed startup with non-Kafka data in data.dirs Key: KAFKA-5118 URL: https://issues.apache.org/jira/browse/KAFKA-5118 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 0.10.2.0 Reporter: Dustin Cote Priority: Minor Today, if you try to startup a broker with some non-Kafka data in the data.dirs you end up with a cryptic message: {code} [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads during logs loading: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 (kafka.log.LogManager) [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) java.lang.StringIndexOutOfBoundsException: String index out of range: -1 {code} It'd be better if we could tell the user to look for non-Kafka data in the data.dirs and print out the offending directory that caused the problem in the first place. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5110) ConsumerGroupCommand error handling improvement
[ https://issues.apache.org/jira/browse/KAFKA-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981086#comment-15981086 ] Dustin Cote commented on KAFKA-5110: [~vahid] I'd love to provide reproduction steps, but I can't seem to reproduce locally. This was seen on a cluster not under my control. If we could print some better info here as [~hachikuji]'s PR is doing, I think it would be instructive in seeing what the root cause may be. > ConsumerGroupCommand error handling improvement > --- > > Key: KAFKA-5110 > URL: https://issues.apache.org/jira/browse/KAFKA-5110 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 0.10.1.1 >Reporter: Dustin Cote >Assignee: Jason Gustafson > > The ConsumerGroupCommand isn't handling partition errors properly. It throws > the following: > {code} > kafka-consumer-groups.sh --zookeeper 10.10.10.10:2181 --group mygroup > --describe > GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER > Error while executing consumer group command empty.head > java.lang.UnsupportedOperationException: empty.head > at scala.collection.immutable.Vector.head(Vector.scala:193) > at > kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$getLogEndOffset$1.apply(ConsumerGroupCommand.scala:197) > at > kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$getLogEndOffset$1.apply(ConsumerGroupCommand.scala:194) > at scala.Option.map(Option.scala:146) > at > kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.getLogEndOffset(ConsumerGroupCommand.scala:194) > at > kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.kafka$admin$ConsumerGroupCommand$ConsumerGroupService$$describePartition(ConsumerGroupCommand.scala:125) > at > kafka.admin.ConsumerGroupCommand$ConsumerGroupService$$anonfun$describeTopicPartition$2.apply(ConsumerGroupCommand.scala:107) > at > kafka.admin.ConsumerGroupCommand$ConsumerGroupService$$anonfun$describeTopicPartition$2.apply(ConsumerGroupCommand.scala:106) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describeTopicPartition(ConsumerGroupCommand.scala:106) > at > kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describeTopicPartition(ConsumerGroupCommand.scala:134) > at > kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.kafka$admin$ConsumerGroupCommand$ZkConsumerGroupService$$describeTopic(ConsumerGroupCommand.scala:181) > at > kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$describeGroup$1.apply(ConsumerGroupCommand.scala:166) > at > kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$describeGroup$1.apply(ConsumerGroupCommand.scala:166) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:166) > at > kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describe(ConsumerGroupCommand.scala:89) > at > kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describe(ConsumerGroupCommand.scala:134) > at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:68) > at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5110) ConsumerGroupCommand error handling improvement
Dustin Cote created KAFKA-5110: -- Summary: ConsumerGroupCommand error handling improvement Key: KAFKA-5110 URL: https://issues.apache.org/jira/browse/KAFKA-5110 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.10.1.1 Reporter: Dustin Cote Assignee: Jason Gustafson The ConsumerGroupCommand isn't handling partition errors properly. It throws the following: {code} kafka-consumer-groups.sh --zookeeper 10.10.10.10:2181 --group mygroup --describe GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER Error while executing consumer group command empty.head java.lang.UnsupportedOperationException: empty.head at scala.collection.immutable.Vector.head(Vector.scala:193) at kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$getLogEndOffset$1.apply(ConsumerGroupCommand.scala:197) at kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$getLogEndOffset$1.apply(ConsumerGroupCommand.scala:194) at scala.Option.map(Option.scala:146) at kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.getLogEndOffset(ConsumerGroupCommand.scala:194) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.kafka$admin$ConsumerGroupCommand$ConsumerGroupService$$describePartition(ConsumerGroupCommand.scala:125) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService$$anonfun$describeTopicPartition$2.apply(ConsumerGroupCommand.scala:107) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService$$anonfun$describeTopicPartition$2.apply(ConsumerGroupCommand.scala:106) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describeTopicPartition(ConsumerGroupCommand.scala:106) at kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describeTopicPartition(ConsumerGroupCommand.scala:134) at kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.kafka$admin$ConsumerGroupCommand$ZkConsumerGroupService$$describeTopic(ConsumerGroupCommand.scala:181) at kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$describeGroup$1.apply(ConsumerGroupCommand.scala:166) at kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$describeGroup$1.apply(ConsumerGroupCommand.scala:166) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:166) at kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describe(ConsumerGroupCommand.scala:89) at kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describe(ConsumerGroupCommand.scala:134) at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:68) at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5074) Transition to OnlinePartition without preferred leader in ISR fails
[ https://issues.apache.org/jira/browse/KAFKA-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969426#comment-15969426 ] Dustin Cote commented on KAFKA-5074: Ah ok I got what you meant. The OfflinePartitionLeaderSelector is invoked for other partitions, but not the ones that stay offline. For example, a different partition: {code} [2017-04-12 16:27:34,270] INFO [OfflinePartitionLeaderSelector]: Selected new leader and ISR {"leader":9,"leader_epoch":2,"isr":[9,11]} for offline partition [topic2,28] (kafka.controller.OfflinePartitionLeaderSelector) {code} Also not seeing any entries like "Error while moving some partitions to the online state" in the controller logs. > Transition to OnlinePartition without preferred leader in ISR fails > --- > > Key: KAFKA-5074 > URL: https://issues.apache.org/jira/browse/KAFKA-5074 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.9.0.0 >Reporter: Dustin Cote > > Running 0.9.0.0, the controller can get into a state where it no longer is > able to elect a leader for an Offline partition. It's unclear how this state > is first achieved but in the steady state, this happens: > -There are partitions with a leader of -1 > -The Controller repeatedly attempts a preferred leader election for these > partitions > -The preferred leader election fails because the only replica in the ISR is > not the preferred leader > The log cycle looks like this: > {code} > [2017-04-12 18:00:18,891] INFO [Controller 8]: Starting preferred replica > leader election for partitions topic,1 > [2017-04-12 18:00:18,891] INFO [Partition state machine on Controller 8]: > Invoking state change to OnlinePartition for partitions topic,1 > [2017-04-12 18:00:18,892] INFO [PreferredReplicaPartitionLeaderSelector]: > Current leader -1 for partition [topic,1] is not the preferred replica. > Trigerring preferred replica leader election > (kafka.controller.PreferredReplicaPartitionLeaderSelector) > [2017-04-12 18:00:18,893] WARN [Controller 8]: Partition [topic,1] failed to > complete preferred replica leader election. Leader is -1 > (kafka.controller.KafkaController) > {code} > It's not clear if this would happen on versions later that 0.9.0.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5074) Transition to OnlinePartition without preferred leader in ISR fails
[ https://issues.apache.org/jira/browse/KAFKA-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969406#comment-15969406 ] Dustin Cote commented on KAFKA-5074: [~ijuma] I think the unexpected behavior here is that the leader is not selected from the ISR. Instead, the cycle of preferred leader election --> attempt transition to online --> failed preferred leader election --> preferred leader election repeats every 5 minutes until the controller is restarted. > Transition to OnlinePartition without preferred leader in ISR fails > --- > > Key: KAFKA-5074 > URL: https://issues.apache.org/jira/browse/KAFKA-5074 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.9.0.0 >Reporter: Dustin Cote > > Running 0.9.0.0, the controller can get into a state where it no longer is > able to elect a leader for an Offline partition. It's unclear how this state > is first achieved but in the steady state, this happens: > -There are partitions with a leader of -1 > -The Controller repeatedly attempts a preferred leader election for these > partitions > -The preferred leader election fails because the only replica in the ISR is > not the preferred leader > The log cycle looks like this: > {code} > [2017-04-12 18:00:18,891] INFO [Controller 8]: Starting preferred replica > leader election for partitions topic,1 > [2017-04-12 18:00:18,891] INFO [Partition state machine on Controller 8]: > Invoking state change to OnlinePartition for partitions topic,1 > [2017-04-12 18:00:18,892] INFO [PreferredReplicaPartitionLeaderSelector]: > Current leader -1 for partition [topic,1] is not the preferred replica. > Trigerring preferred replica leader election > (kafka.controller.PreferredReplicaPartitionLeaderSelector) > [2017-04-12 18:00:18,893] WARN [Controller 8]: Partition [topic,1] failed to > complete preferred replica leader election. Leader is -1 > (kafka.controller.KafkaController) > {code} > It's not clear if this would happen on versions later that 0.9.0.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5074) Transition to OnlinePartition without preferred leader in ISR fails
Dustin Cote created KAFKA-5074: -- Summary: Transition to OnlinePartition without preferred leader in ISR fails Key: KAFKA-5074 URL: https://issues.apache.org/jira/browse/KAFKA-5074 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 0.9.0.0 Reporter: Dustin Cote Running 0.9.0.0, the controller can get into a state where it no longer is able to elect a leader for an Offline partition. It's unclear how this state is first achieved but in the steady state, this happens: -There are partitions with a leader of -1 -The Controller repeatedly attempts a preferred leader election for these partitions -The preferred leader election fails because the only replica in the ISR is not the preferred leader The log cycle looks like this: {code} [2017-04-12 18:00:18,891] INFO [Controller 8]: Starting preferred replica leader election for partitions topic,1 [2017-04-12 18:00:18,891] INFO [Partition state machine on Controller 8]: Invoking state change to OnlinePartition for partitions topic,1 [2017-04-12 18:00:18,892] INFO [PreferredReplicaPartitionLeaderSelector]: Current leader -1 for partition [topic,1] is not the preferred replica. Trigerring preferred replica leader election (kafka.controller.PreferredReplicaPartitionLeaderSelector) [2017-04-12 18:00:18,893] WARN [Controller 8]: Partition [topic,1] failed to complete preferred replica leader election. Leader is -1 (kafka.controller.KafkaController) {code} It's not clear if this would happen on versions later that 0.9.0.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-3955) Kafka log recovery doesn't truncate logs on non-monotonic offsets, leading to failed broker boot
[ https://issues.apache.org/jira/browse/KAFKA-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960854#comment-15960854 ] Dustin Cote commented on KAFKA-3955: Adding 0.10.1.1 as this kind of behavior has been observed on that release as well. > Kafka log recovery doesn't truncate logs on non-monotonic offsets, leading to > failed broker boot > > > Key: KAFKA-3955 > URL: https://issues.apache.org/jira/browse/KAFKA-3955 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0, 0.8.1, 0.8.1.1, 0.8.2.0, 0.8.2.1, 0.8.2.2, > 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.1.1 >Reporter: Tom Crayford > > Hi, > I've found a bug impacting kafka brokers on startup after an unclean > shutdown. If a log segment is corrupt and has non-monotonic offsets (see the > appendix of this bug for a sample output from {{DumpLogSegments}}), then > {{LogSegment.recover}} throws an {{InvalidOffsetException}} error: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/OffsetIndex.scala#L218 > That code is called by {{LogSegment.recover}}: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogSegment.scala#L191 > Which is called in several places in {{Log.scala}}. Notably it's called four > times during recovery: > Thrice in Log.loadSegments > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L199 > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L204 > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L226 > and once in Log.recoverLog > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L268 > Of these, only the very last one has a {{catch}} for > {{InvalidOffsetException}}. When that catches the issue, it truncates the > whole log (not just this segment): > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L274 > to the start segment of the bad log segment. > However, this code can't be hit on recovery, because of the code paths in > {{loadSegments}} - they mean we'll never hit truncation here, as we always > throw this exception and that goes all the way to the toplevel exception > handler and crashes the JVM. > As {{Log.recoverLog}} is always called during recovery, I *think* a fix for > this is to move this crash recovery/truncate code inside a new method in > {{Log.scala}}, and call that instead of {{LogSegment.recover}} in each place. > That code should return the number of {{truncatedBytes}} like we do in > {{Log.recoverLog}} and then truncate the log. The callers will have to be > notified "stop iterating over files in the directory", likely via a return > value of {{truncatedBytes}} like {{Log.recoverLog` does right now. > I'm happy working on a patch for this. I'm aware this recovery code is tricky > and important to get right. > I'm also curious (and currently don't have good theories as of yet) as to how > this log segment got into this state with non-monotonic offsets. This segment > is using gzip compression, and is under 0.9.0.1. The same bug with respect to > recovery exists in trunk, but I'm unsure if the new handling around > compressed messages (KIP-31) means the bug where non-monotonic offsets get > appended is still present in trunk. > As a production workaround, one can manually truncate that log folder > yourself (delete all .index/.log files including and after the one with the > bad offset). However, kafka should (and can) handle this case well - with > replication we can truncate in broker startup. > stacktrace and error message: > {code} > pri=WARN t=pool-3-thread-4 at=Log Found a corrupted index file, > /$DIRECTORY/$TOPIC-22/14306536.index, deleting and rebuilding > index... > pri=ERROR t=main at=LogManager There was an error in one of the threads > during logs loading: kafka.common.InvalidOffsetException: Attempt to append > an offset (15000337) to position 111719 no larger than the last offset > appended (15000337) to /$DIRECTORY/$TOPIC-8/14008931.index. > pri=FATAL t=main at=KafkaServer Fatal error during KafkaServer startup. > Prepare to shutdown kafka.common.InvalidOffsetException: Attempt to append an > offset (15000337) to position 111719 no larger than the last offset appended > (15000337) to /$DIRECTORY/$TOPIC-8/14008931.index. > at > kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207) > at > kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197) > at > kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at
[jira] [Updated] (KAFKA-3955) Kafka log recovery doesn't truncate logs on non-monotonic offsets, leading to failed broker boot
[ https://issues.apache.org/jira/browse/KAFKA-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated KAFKA-3955: --- Affects Version/s: 0.10.1.1 > Kafka log recovery doesn't truncate logs on non-monotonic offsets, leading to > failed broker boot > > > Key: KAFKA-3955 > URL: https://issues.apache.org/jira/browse/KAFKA-3955 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0, 0.8.1, 0.8.1.1, 0.8.2.0, 0.8.2.1, 0.8.2.2, > 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.1.1 >Reporter: Tom Crayford > > Hi, > I've found a bug impacting kafka brokers on startup after an unclean > shutdown. If a log segment is corrupt and has non-monotonic offsets (see the > appendix of this bug for a sample output from {{DumpLogSegments}}), then > {{LogSegment.recover}} throws an {{InvalidOffsetException}} error: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/OffsetIndex.scala#L218 > That code is called by {{LogSegment.recover}}: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogSegment.scala#L191 > Which is called in several places in {{Log.scala}}. Notably it's called four > times during recovery: > Thrice in Log.loadSegments > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L199 > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L204 > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L226 > and once in Log.recoverLog > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L268 > Of these, only the very last one has a {{catch}} for > {{InvalidOffsetException}}. When that catches the issue, it truncates the > whole log (not just this segment): > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L274 > to the start segment of the bad log segment. > However, this code can't be hit on recovery, because of the code paths in > {{loadSegments}} - they mean we'll never hit truncation here, as we always > throw this exception and that goes all the way to the toplevel exception > handler and crashes the JVM. > As {{Log.recoverLog}} is always called during recovery, I *think* a fix for > this is to move this crash recovery/truncate code inside a new method in > {{Log.scala}}, and call that instead of {{LogSegment.recover}} in each place. > That code should return the number of {{truncatedBytes}} like we do in > {{Log.recoverLog}} and then truncate the log. The callers will have to be > notified "stop iterating over files in the directory", likely via a return > value of {{truncatedBytes}} like {{Log.recoverLog` does right now. > I'm happy working on a patch for this. I'm aware this recovery code is tricky > and important to get right. > I'm also curious (and currently don't have good theories as of yet) as to how > this log segment got into this state with non-monotonic offsets. This segment > is using gzip compression, and is under 0.9.0.1. The same bug with respect to > recovery exists in trunk, but I'm unsure if the new handling around > compressed messages (KIP-31) means the bug where non-monotonic offsets get > appended is still present in trunk. > As a production workaround, one can manually truncate that log folder > yourself (delete all .index/.log files including and after the one with the > bad offset). However, kafka should (and can) handle this case well - with > replication we can truncate in broker startup. > stacktrace and error message: > {code} > pri=WARN t=pool-3-thread-4 at=Log Found a corrupted index file, > /$DIRECTORY/$TOPIC-22/14306536.index, deleting and rebuilding > index... > pri=ERROR t=main at=LogManager There was an error in one of the threads > during logs loading: kafka.common.InvalidOffsetException: Attempt to append > an offset (15000337) to position 111719 no larger than the last offset > appended (15000337) to /$DIRECTORY/$TOPIC-8/14008931.index. > pri=FATAL t=main at=KafkaServer Fatal error during KafkaServer startup. > Prepare to shutdown kafka.common.InvalidOffsetException: Attempt to append an > offset (15000337) to position 111719 no larger than the last offset appended > (15000337) to /$DIRECTORY/$TOPIC-8/14008931.index. > at > kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207) > at > kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197) > at > kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.log.OffsetIndex.append(OffsetIndex.scala:197) > at kafka.log.LogSegment.recover(LogSegment.scala:188) >
[jira] [Created] (KAFKA-4861) log.message.timestamp.type=LogAppendTime breaks Kafka based consumers
Dustin Cote created KAFKA-4861: -- Summary: log.message.timestamp.type=LogAppendTime breaks Kafka based consumers Key: KAFKA-4861 URL: https://issues.apache.org/jira/browse/KAFKA-4861 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.10.2.0 Reporter: Dustin Cote Assignee: Jason Gustafson Priority: Blocker Using 0.10.2 brokers with the property `log.message.timestamp.type=LogAppendTime` breaks all Kafka-based consumers for the cluster. The consumer will return: {code} [2017-03-07 15:25:10,215] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$) org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: The timestamp of the message is out of acceptable range. at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:535) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:508) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:764) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:745) at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:186) at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:149) at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:116) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:493) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:322) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:253) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:334) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286) at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) at kafka.consumer.NewShinyConsumer.(BaseConsumer.scala:55) at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69) at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50) at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code} On the broker side you see: {code} [2017-03-07 15:25:20,216] INFO [GroupCoordinator 0]: Group console-consumer-73205 with generation 2 is now empty (kafka.coordinator.GroupCoordinator) [2017-03-07 15:25:20,217] ERROR [Group Metadata Manager on Broker 0]: Appending metadata message for group console-consumer-73205 generation 2 failed due to unexpected error: org.apache.kafka.common.errors.InvalidTimestampException (kafka.coordinator.GroupMetadataManager) [2017-03-07 15:25:20,218] WARN [GroupCoordinator 0]: Failed to write empty metadata for group console-consumer-73205: The timestamp of the message is out of acceptable range. (kafka.coordinator.GroupCoordinator) {code} Marking as a blocker since this appears to be a regression in that it doesn't happen on 0.10.1.1 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-4618) Enable clients to "re-bootstrap" in the event of a full cluster migration
Dustin Cote created KAFKA-4618: -- Summary: Enable clients to "re-bootstrap" in the event of a full cluster migration Key: KAFKA-4618 URL: https://issues.apache.org/jira/browse/KAFKA-4618 Project: Kafka Issue Type: Improvement Components: clients Affects Versions: 0.10.1.1 Reporter: Dustin Cote Priority: Minor Today, clients only bootstrap upon startup. This works well in most cases, but following up on the discussion [here|https://issues.apache.org/jira/browse/KAFKA-3068?focusedCommentId=15115702=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15115702] there are scenarios where you may want to "re-bootstrap" a client on the fly. One example scenario is if you have an active-passive failover cluster setup and a load balancer at the front managing. If you lose all of the brokers in the active cluster and need to immediately failover to the other set of brokers, you would need to re-bootstrap, meaning clients would have to restart. Having the ability to go back through the bootstrapping process with a restart would be a nice improvement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796240#comment-15796240 ] Dustin Cote commented on KAFKA-2394: [~gquintana] I agree it would be better, but the existing Kafka dependencies only include Log4J v1 without extras and I'd rather not muddy the waters here by updating dependencies too. There's another JIRA with the goal of updating Log4J that will hopefully render this one unnecessary if we can get that one into Kafka before the next big release. That's KAFKA-1368. It'd be good to get your input on that JIRA if you are looking to see the out of the box logging updated. > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Dustin Cote >Priority: Minor > Labels: newbie > Fix For: 0.11.0.0 > > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.
[ https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733442#comment-15733442 ] Dustin Cote commented on KAFKA-4477: It may also be helpful to grab the associated zookeeper logs. If the bad server is losing contact with zookeeper at the same time that it loses connection with the other brokers, things can end up in a weird state. > Node reduces its ISR to itself, and doesn't recover. Other nodes do not take > leadership, cluster remains sick until node is restarted. > -- > > Key: KAFKA-4477 > URL: https://issues.apache.org/jira/browse/KAFKA-4477 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0 > Environment: RHEL7 > java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) >Reporter: Michael Andre Pearce (IG) >Assignee: Apurva Mehta >Priority: Critical > Labels: reliability > Attachments: kafka.jstack > > > We have encountered a critical issue that has re-occured in different > physical environments. We haven't worked out what is going on. We do though > have a nasty work around to keep service alive. > We do have not had this issue on clusters still running 0.9.01. > We have noticed a node randomly shrinking for the partitions it owns the > ISR's down to itself, moments later we see other nodes having disconnects, > followed by finally app issues, where producing to these partitions is > blocked. > It seems only by restarting the kafka instance java process resolves the > issues. > We have had this occur multiple times and from all network and machine > monitoring the machine never left the network, or had any other glitches. > Below are seen logs from the issue. > Node 7: > [2016-12-01 07:01:28,112] INFO Partition > [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking > ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from > 1,2,7 to 7 (kafka.cluster.Partition) > All other nodes: > [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 7 was disconnected before the response was > read > All clients: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > After this occurs, we then suddenly see on the sick machine an increasing > amount of close_waits and file descriptors. > As a work around to keep service we are currently putting in an automated > process that tails and regex's for: and where new_partitions hit just itself > we restart the node. > "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for > partition \[.*\] from (?P.+) to (?P.+) > \(kafka.cluster.Partition\)" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (KAFKA-4351) Topic regex behavioral change with MirrorMaker new consumer
[ https://issues.apache.org/jira/browse/KAFKA-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote reopened KAFKA-4351: Looks like this got marked resolved accidentally when the PR was made. Reopening but cc'ing [~huxi_2b]. Thanks for picking it up. > Topic regex behavioral change with MirrorMaker new consumer > --- > > Key: KAFKA-4351 > URL: https://issues.apache.org/jira/browse/KAFKA-4351 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 0.9.0.1 >Reporter: Dustin Cote >Assignee: huxi >Priority: Minor > > There is a behavioral change when you use MirrorMaker with the new consumer > versus the old consumer with respect to regex parsing of the input topic > list. If one combines a regex with a comma separated list for the topic list > with old consumer implementation of MirrorMaker, it works ok. Example: > {code} > --whitelist '.+(my|mine),topic,anothertopic' > {code} > If you pass this in with the new consumer implementation, no topics are > found. The workaround is to use something like: > {code} > --whitelist '.+(my|mine)|topic|anothertopic' > {code} > We should make an effort to be consistent between the two implementations as > it's a bit surprising when migrating. I've verified the problem exists on > 0.9.0.1 but I don't see at first any fix to the problem in later versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4351) Topic regex behavioral change with MirrorMaker new consumer
Dustin Cote created KAFKA-4351: -- Summary: Topic regex behavioral change with MirrorMaker new consumer Key: KAFKA-4351 URL: https://issues.apache.org/jira/browse/KAFKA-4351 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.9.0.1 Reporter: Dustin Cote Priority: Minor There is a behavioral change when you use MirrorMaker with the new consumer versus the old consumer with respect to regex parsing of the input topic list. If one combines a regex with a comma separated list for the topic list with old consumer implementation of MirrorMaker, it works ok. Example: {code} --whitelist '.+(my|mine),topic,anothertopic' {code} If you pass this in with the new consumer implementation, no topics are found. The workaround is to use something like: {code} --whitelist '.+(my|mine)|topic|anothertopic' {code} We should make an effort to be consistent between the two implementations as it's a bit surprising when migrating. I've verified the problem exists on 0.9.0.1 but I don't see at first any fix to the problem in later versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4320) Log compaction docs update
[ https://issues.apache.org/jira/browse/KAFKA-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592128#comment-15592128 ] Dustin Cote commented on KAFKA-4320: [~Aurijoy] great, welcome! First have a look at the how to contribute guide [http://kafka.apache.org/contributing]. Then once you've gotten set up, you can make a pull request on the github for kafka [https://github.com/apache/kafka]. It looks like this doesn't affect the docs in the trunk branch, but it should be fixed for 0.9 and 0.10 branches. Have a look here: [https://github.com/apache/kafka/blob/0.10.0/docs/design.html#L343]. Doc fixes are a great way to get involved, so feel free to fix any typos you might see as well. > Log compaction docs update > -- > > Key: KAFKA-4320 > URL: https://issues.apache.org/jira/browse/KAFKA-4320 > Project: Kafka > Issue Type: Bug > Components: documentation >Reporter: Dustin Cote >Priority: Minor > Labels: newbie > > The log compaction docs are out of date. At least the default is said to be > that log compaction is disabled which is not true as of 0.9.0.1. Probably > the whole section needs a once over to make sure it's in line with what is > currently there. This is the section: > [http://kafka.apache.org/documentation#design_compactionconfig] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4320) Log compaction docs update
[ https://issues.apache.org/jira/browse/KAFKA-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated KAFKA-4320: --- Labels: newbie (was: ) > Log compaction docs update > -- > > Key: KAFKA-4320 > URL: https://issues.apache.org/jira/browse/KAFKA-4320 > Project: Kafka > Issue Type: Bug > Components: documentation >Reporter: Dustin Cote >Priority: Minor > Labels: newbie > > The log compaction docs are out of date. At least the default is said to be > that log compaction is disabled which is not true as of 0.9.0.1. Probably > the whole section needs a once over to make sure it's in line with what is > currently there. This is the section: > [http://kafka.apache.org/documentation#design_compactionconfig] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4320) Log compaction docs update
Dustin Cote created KAFKA-4320: -- Summary: Log compaction docs update Key: KAFKA-4320 URL: https://issues.apache.org/jira/browse/KAFKA-4320 Project: Kafka Issue Type: Bug Components: documentation Reporter: Dustin Cote Priority: Minor The log compaction docs are out of date. At least the default is said to be that log compaction is disabled which is not true as of 0.9.0.1. Probably the whole section needs a once over to make sure it's in line with what is currently there. This is the section: [http://kafka.apache.org/documentation#design_compactionconfig] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4207) Partitions stopped after a rapid restart of a broker
[ https://issues.apache.org/jira/browse/KAFKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517727#comment-15517727 ] Dustin Cote commented on KAFKA-4207: Agreed, marking this one as a duplicate > Partitions stopped after a rapid restart of a broker > > > Key: KAFKA-4207 > URL: https://issues.apache.org/jira/browse/KAFKA-4207 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.9.0.1, 0.10.0.1 >Reporter: Dustin Cote > > Environment: > 4 Kafka brokers > 10,000 topics with one partition each, replication factor 3 > Partitions with 4KB data each > No data being produced or consumed > Scenario: > Initiate controlled shutdown on one broker > Interrupt controlled shutdown prior completion with a SIGKILL > Start a new broker with the same broker ID as broker that was just killed > immediately > Symptoms: > After starting the new broker, the other three brokers in the cluster will > see under replicated partitions forever for some partitions that are hosted > on the broker that was killed and restarted > Cause: > Today, the controller sends a StopReplica command for each replica hosted on > a broker that has initiated a controlled shutdown. For a large number of > replicas this can take awhile. When the broker that is doing the controlled > shutdown is killed, the StopReplica commands are queued up even though the > request queue to the broker is cleared. When the broker comes back online, > the StopReplica commands that were queued, get sent to the broker that just > started up. > CC: [~junrao] since he's familiar with the scenario seen here -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-4207) Partitions stopped after a rapid restart of a broker
[ https://issues.apache.org/jira/browse/KAFKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote resolved KAFKA-4207. Resolution: Duplicate > Partitions stopped after a rapid restart of a broker > > > Key: KAFKA-4207 > URL: https://issues.apache.org/jira/browse/KAFKA-4207 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 0.9.0.1, 0.10.0.1 >Reporter: Dustin Cote > > Environment: > 4 Kafka brokers > 10,000 topics with one partition each, replication factor 3 > Partitions with 4KB data each > No data being produced or consumed > Scenario: > Initiate controlled shutdown on one broker > Interrupt controlled shutdown prior completion with a SIGKILL > Start a new broker with the same broker ID as broker that was just killed > immediately > Symptoms: > After starting the new broker, the other three brokers in the cluster will > see under replicated partitions forever for some partitions that are hosted > on the broker that was killed and restarted > Cause: > Today, the controller sends a StopReplica command for each replica hosted on > a broker that has initiated a controlled shutdown. For a large number of > replicas this can take awhile. When the broker that is doing the controlled > shutdown is killed, the StopReplica commands are queued up even though the > request queue to the broker is cleared. When the broker comes back online, > the StopReplica commands that were queued, get sent to the broker that just > started up. > CC: [~junrao] since he's familiar with the scenario seen here -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4207) Partitions stopped after a rapid restart of a broker
Dustin Cote created KAFKA-4207: -- Summary: Partitions stopped after a rapid restart of a broker Key: KAFKA-4207 URL: https://issues.apache.org/jira/browse/KAFKA-4207 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 0.10.0.1, 0.9.0.1 Reporter: Dustin Cote Environment: 4 Kafka brokers 10,000 topics with one partition each, replication factor 3 Partitions with 4KB data each No data being produced or consumed Scenario: Initiate controlled shutdown on one broker Interrupt controlled shutdown prior completion with a SIGKILL Start a new broker with the same broker ID as broker that was just killed immediately Symptoms: After starting the new broker, the other three brokers in the cluster will see under replicated partitions forever for some partitions that are hosted on the broker that was killed and restarted Cause: Today, the controller sends a StopReplica command for each replica hosted on a broker that has initiated a controlled shutdown. For a large number of replicas this can take awhile. When the broker that is doing the controlled shutdown is killed, the StopReplica commands are queued up even though the request queue to the broker is cleared. When the broker comes back online, the StopReplica commands that were queued, get sent to the broker that just started up. CC: [~junrao] since he's familiar with the scenario seen here -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4169) Calculation of message size is too conservative for compressed messages
[ https://issues.apache.org/jira/browse/KAFKA-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490387#comment-15490387 ] Dustin Cote commented on KAFKA-4169: Ah, good point. The user reporting this issue suggested pushing the check into the RecordAccumulator. That might be a better option. > Calculation of message size is too conservative for compressed messages > --- > > Key: KAFKA-4169 > URL: https://issues.apache.org/jira/browse/KAFKA-4169 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.10.0.1 >Reporter: Dustin Cote > > Currently the producer uses the uncompressed message size to check against > {{max.request.size}} even if a {{compression.type}} is defined. This can be > reproduced as follows: > {code} > # dd if=/dev/zero of=/tmp/outsmaller.dat bs=1024 count=1000 > # cat /tmp/out.dat | bin/kafka-console-producer --broker-list localhost:9092 > --topic tester --producer-property compression.type=gzip > {code} > The above code creates a file that is the same size as the default for > {{max.request.size}} and the added overhead of the message pushes the > uncompressed size over the limit. Compressing the message ahead of time > allows the message to go through. When the message is blocked, the following > exception is produced: > {code} > [2016-09-14 08:56:19,558] ERROR Error when sending message to topic tester > with key: null, value: 1048576 bytes with error: > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) > org.apache.kafka.common.errors.RecordTooLargeException: The message is > 1048610 bytes when serialized which is larger than the maximum request size > you have configured with the max.request.size configuration. > {code} > For completeness, I have confirmed that the console producer is setting > {{compression.type}} properly by enabling DEBUG so this appears to be a > problem in the size estimate of the message itself. I would suggest we > compress before we serialize instead of the other way around to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4169) Calculation of message size is too conservative for compressed messages
Dustin Cote created KAFKA-4169: -- Summary: Calculation of message size is too conservative for compressed messages Key: KAFKA-4169 URL: https://issues.apache.org/jira/browse/KAFKA-4169 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.10.0.1 Reporter: Dustin Cote Currently the producer uses the uncompressed message size to check against {{max.request.size}} even if a {{compression.type}} is defined. This can be reproduced as follows: {code} # dd if=/dev/zero of=/tmp/outsmaller.dat bs=1024 count=1000 # cat /tmp/out.dat | bin/kafka-console-producer --broker-list localhost:9092 --topic tester --producer-property compression.type=gzip {code} The above code creates a file that is the same size as the default for {{max.request.size}} and the added overhead of the message pushes the uncompressed size over the limit. Compressing the message ahead of time allows the message to go through. When the message is blocked, the following exception is produced: {code} [2016-09-14 08:56:19,558] ERROR Error when sending message to topic tester with key: null, value: 1048576 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.RecordTooLargeException: The message is 1048610 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration. {code} For completeness, I have confirmed that the console producer is setting {{compression.type}} properly by enabling DEBUG so this appears to be a problem in the size estimate of the message itself. I would suggest we compress before we serialize instead of the other way around to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling
[ https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15473820#comment-15473820 ] Dustin Cote commented on KAFKA-3590: [~hachikuji] is going to take this one over for now. > KafkaConsumer fails with "Messages are rejected since there are fewer in-sync > replicas than required." when polling > --- > > Key: KAFKA-3590 > URL: https://issues.apache.org/jira/browse/KAFKA-3590 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: JDK1.8 Ubuntu 14.04 >Reporter: Sergey Alaev >Assignee: Jason Gustafson > > KafkaConsumer.poll() fails with "Messages are rejected since there are fewer > in-sync replicas than required.". Isn't this message about minimum number of > ISR's when *sending* messages? > Stacktrace: > org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: > Messages are rejected since there are fewer in-sync replicas than required. > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > ~[kafka-clients-0.9.0.1.jar:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling
[ https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated KAFKA-3590: --- Status: Open (was: Patch Available) > KafkaConsumer fails with "Messages are rejected since there are fewer in-sync > replicas than required." when polling > --- > > Key: KAFKA-3590 > URL: https://issues.apache.org/jira/browse/KAFKA-3590 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: JDK1.8 Ubuntu 14.04 >Reporter: Sergey Alaev >Assignee: Dustin Cote > > KafkaConsumer.poll() fails with "Messages are rejected since there are fewer > in-sync replicas than required.". Isn't this message about minimum number of > ISR's when *sending* messages? > Stacktrace: > org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: > Messages are rejected since there are fewer in-sync replicas than required. > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > ~[kafka-clients-0.9.0.1.jar:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling
[ https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated KAFKA-3590: --- Assignee: Jason Gustafson (was: Dustin Cote) > KafkaConsumer fails with "Messages are rejected since there are fewer in-sync > replicas than required." when polling > --- > > Key: KAFKA-3590 > URL: https://issues.apache.org/jira/browse/KAFKA-3590 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: JDK1.8 Ubuntu 14.04 >Reporter: Sergey Alaev >Assignee: Jason Gustafson > > KafkaConsumer.poll() fails with "Messages are rejected since there are fewer > in-sync replicas than required.". Isn't this message about minimum number of > ISR's when *sending* messages? > Stacktrace: > org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: > Messages are rejected since there are fewer in-sync replicas than required. > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > ~[kafka-clients-0.9.0.1.jar:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (KAFKA-3129) Console producer issue when request-required-acks=0
[ https://issues.apache.org/jira/browse/KAFKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote reopened KAFKA-3129: Reopening per [~ijuma]'s recommendation as acks=0 in general still has a problem somewhere along the way that isn't fully understood. > Console producer issue when request-required-acks=0 > --- > > Key: KAFKA-3129 > URL: https://issues.apache.org/jira/browse/KAFKA-3129 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.9.0.0, 0.10.0.0 >Reporter: Vahid Hashemian >Assignee: Dustin Cote > Attachments: kafka-3129.mov, server.log.abnormal.txt, > server.log.normal.txt > > > I have been running a simple test case in which I have a text file > {{messages.txt}} with 1,000,000 lines (lines contain numbers from 1 to > 1,000,000 in ascending order). I run the console consumer like this: > {{$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test}} > Topic {{test}} is on 1 partition with a replication factor of 1. > Then I run the console producer like this: > {{$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < > messages.txt}} > Then the console starts receiving the messages. And about half the times it > goes all the way to 1,000,000. But, in other cases, it stops short, usually > at 999,735. > I tried running another console consumer on another machine and both > consumers behave the same way. I can't see anything related to this in the > logs. > I also ran the same experiment with a similar file of 10,000 lines, and am > getting a similar behavior. When the consumer does not receive all the 10,000 > messages it usually stops at 9,864. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-3129) Console producer issue when request-required-acks=0
[ https://issues.apache.org/jira/browse/KAFKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote resolved KAFKA-3129. Resolution: Fixed > Console producer issue when request-required-acks=0 > --- > > Key: KAFKA-3129 > URL: https://issues.apache.org/jira/browse/KAFKA-3129 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.9.0.0, 0.10.0.0 >Reporter: Vahid Hashemian >Assignee: Dustin Cote > Attachments: kafka-3129.mov, server.log.abnormal.txt, > server.log.normal.txt > > > I have been running a simple test case in which I have a text file > {{messages.txt}} with 1,000,000 lines (lines contain numbers from 1 to > 1,000,000 in ascending order). I run the console consumer like this: > {{$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test}} > Topic {{test}} is on 1 partition with a replication factor of 1. > Then I run the console producer like this: > {{$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < > messages.txt}} > Then the console starts receiving the messages. And about half the times it > goes all the way to 1,000,000. But, in other cases, it stops short, usually > at 999,735. > I tried running another console consumer on another machine and both > consumers behave the same way. I can't see anything related to this in the > logs. > I also ran the same experiment with a similar file of 10,000 lines, and am > getting a similar behavior. When the consumer does not receive all the 10,000 > messages it usually stops at 9,864. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4134) Transparently notify users of "Connection Refused" for client to broker connections
Dustin Cote created KAFKA-4134: -- Summary: Transparently notify users of "Connection Refused" for client to broker connections Key: KAFKA-4134 URL: https://issues.apache.org/jira/browse/KAFKA-4134 Project: Kafka Issue Type: Improvement Components: consumer, producer Affects Versions: 0.10.0.1 Reporter: Dustin Cote Assignee: Dustin Cote Priority: Minor Currently, Producers and Consumers log at the WARN level if the bootstrap server disconnects and if there is an unexpected exception in the network Selector. However, we log at DEBUG level if an IOException occurs in order to prevent spamming the user with every network hiccup. This has the side effect of users making initial connections to brokers not getting any feedback if the bootstrap server list is invalid. For example, if one starts the console producer or consumer up without any brokers running, nothing indicates messages are not being received until the socket timeout is hit. I propose we be more granular and log the ConnectException to let the user know their broker(s) are not reachable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-3129) Console producer issue when request-required-acks=0
[ https://issues.apache.org/jira/browse/KAFKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote reassigned KAFKA-3129: -- Assignee: Dustin Cote (was: Neha Narkhede) > Console producer issue when request-required-acks=0 > --- > > Key: KAFKA-3129 > URL: https://issues.apache.org/jira/browse/KAFKA-3129 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.9.0.0, 0.10.0.0 >Reporter: Vahid Hashemian >Assignee: Dustin Cote > Attachments: kafka-3129.mov, server.log.abnormal.txt, > server.log.normal.txt > > > I have been running a simple test case in which I have a text file > {{messages.txt}} with 1,000,000 lines (lines contain numbers from 1 to > 1,000,000 in ascending order). I run the console consumer like this: > {{$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test}} > Topic {{test}} is on 1 partition with a replication factor of 1. > Then I run the console producer like this: > {{$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < > messages.txt}} > Then the console starts receiving the messages. And about half the times it > goes all the way to 1,000,000. But, in other cases, it stops short, usually > at 999,735. > I tried running another console consumer on another machine and both > consumers behave the same way. I can't see anything related to this in the > logs. > I also ran the same experiment with a similar file of 10,000 lines, and am > getting a similar behavior. When the consumer does not receive all the 10,000 > messages it usually stops at 9,864. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4092) retention.bytes should not be allowed to be less than segment.bytes
Dustin Cote created KAFKA-4092: -- Summary: retention.bytes should not be allowed to be less than segment.bytes Key: KAFKA-4092 URL: https://issues.apache.org/jira/browse/KAFKA-4092 Project: Kafka Issue Type: Improvement Components: log Reporter: Dustin Cote Assignee: Dustin Cote Priority: Minor Right now retention.bytes can be as small as the user wants but it doesn't really get acted on for the active segment if retention.bytes is smaller than segment.bytes. We shouldn't allow retention.bytes to be less than segment.bytes and validate that at startup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3129) Console producer issue when request-required-acks=0
[ https://issues.apache.org/jira/browse/KAFKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439670#comment-15439670 ] Dustin Cote commented on KAFKA-3129: What I'm seeing is that we are faking the callback {code}org.apache.kafka.clients.producer.internals.Sender#handleProduceResponse{code} for the case where acks=0. This is a problem because the callback gets generated when we do {code}org.apache.kafka.clients.producer.internals.Sender#createProduceRequests{code} in the run loop but the actual send happens a bit later. When close() comes in that window between createProduceRequests and the send, you get messages that are lost. Funny thing is that if you slow down call stack a bit by turning on something like strace, the issue goes away so it's hard to tell which layer exactly is buffering the requests. So my question is, do we want to risk a small performance hit for all producers to be able to guarantee all messages with acks=0 actually make it out of the producer knowing full well that they aren't going to be verified to have made it to the broker? I personally don't feel it's worth the extra locking complexity and could be documented known durability issue when you aren't using durability settings. If we go that route, I feel like the console producer should have acks=1 by default. That way, users who are getting started with the built-in tools have a cursory durability guarantee and can tune for performance instead. What do you think [~ijuma] and [~vahid]? > Console producer issue when request-required-acks=0 > --- > > Key: KAFKA-3129 > URL: https://issues.apache.org/jira/browse/KAFKA-3129 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.9.0.0, 0.10.0.0 >Reporter: Vahid Hashemian >Assignee: Neha Narkhede > Attachments: kafka-3129.mov, server.log.abnormal.txt, > server.log.normal.txt > > > I have been running a simple test case in which I have a text file > {{messages.txt}} with 1,000,000 lines (lines contain numbers from 1 to > 1,000,000 in ascending order). I run the console consumer like this: > {{$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test}} > Topic {{test}} is on 1 partition with a replication factor of 1. > Then I run the console producer like this: > {{$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < > messages.txt}} > Then the console starts receiving the messages. And about half the times it > goes all the way to 1,000,000. But, in other cases, it stops short, usually > at 999,735. > I tried running another console consumer on another machine and both > consumers behave the same way. I can't see anything related to this in the > logs. > I also ran the same experiment with a similar file of 10,000 lines, and am > getting a similar behavior. When the consumer does not receive all the 10,000 > messages it usually stops at 9,864. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-3993) Console producer drops data
[ https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote resolved KAFKA-3993. Resolution: Duplicate Marking this a duplicate of KAFKA-3129. The workaround here is to set acks=1 on the console producer but acks=0 shouldn't mean that all requests don't get to make it out of the queue before close() finishes. I'll follow up on KAFKA-3129 to keep everything in one place. > Console producer drops data > --- > > Key: KAFKA-3993 > URL: https://issues.apache.org/jira/browse/KAFKA-3993 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover > > The console producer drops data when if the process exits too quickly. I > suspect that the shutdown hook does not call close() or something goes wrong > during that close(). > Here's a simple to illustrate the issue: > {noformat} > export BOOTSTRAP_SERVERS=localhost:9092 > export TOPIC=bar > export MESSAGES=1 > ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 > --replication-factor 1 --topic "$TOPIC" \ > && echo "acks=all" > /tmp/producer.config \ > && echo "linger.ms=0" >> /tmp/producer.config \ > && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list > "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \ > && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" > --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC" > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4062) Require --print-data-log if --offsets-decoder is enabled for DumpLogOffsets
Dustin Cote created KAFKA-4062: -- Summary: Require --print-data-log if --offsets-decoder is enabled for DumpLogOffsets Key: KAFKA-4062 URL: https://issues.apache.org/jira/browse/KAFKA-4062 Project: Kafka Issue Type: Improvement Components: admin Reporter: Dustin Cote Assignee: Dustin Cote Priority: Minor When using the DumpLogOffsets tool, if you want to print out contents of __consumer_offsets, you would typically use --offsets-decoder as an option. This option doesn't actually do much without --print-data-log enabled, so we should just require it to prevent user errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4017) Return more helpful responses when misconfigured connectors are submitted
[ https://issues.apache.org/jira/browse/KAFKA-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407746#comment-15407746 ] Dustin Cote commented on KAFKA-4017: [~ewencp] yeah that's right. Super easy to reproduce as well, just try to submit a connector with a bad config syntactically. The actual curl I presented was: {code} curl -X POST -H "Content-Type: application/json" --data '{"name": "local-console-source", "config": {"connector.class":"org.apache.kafka.connect.file.FileStreamSourceConnector", "tasks.max":"1", "topic"="connect-test" }}' http://localhost:8083/connectors {code} > Return more helpful responses when misconfigured connectors are submitted > - > > Key: KAFKA-4017 > URL: https://issues.apache.org/jira/browse/KAFKA-4017 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect >Affects Versions: 0.10.0.0 >Reporter: Dustin Cote >Assignee: Ewen Cheslack-Postava > > Currently if a user submits a connector with a malformed configuration with > connect in distributed mode, the response is: > {code} > > > > Error 500 > > > HTTP ERROR: 500 > Problem accessing /connectors. Reason: > Request failed. > Powered by Jetty:// > > > {code} > If the user decides to then go look at the connect server side logging, they > can maybe parse the stack traces to find out what happened, but are at first > greeted by: > {code} > [2016-08-03 16:14:07,797] WARN /connectors > (org.eclipse.jetty.server.HttpChannel:384) > java.lang.NoSuchMethodError: > javax.servlet.http.HttpServletRequest.isAsyncStarted()Z > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:684) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at > org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:159) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:499) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > {code} > It would be better if Connect can handle this scenario more gracefully and > make it more clear what the problem is even directly to the client. In the > example above, you can eventually locate the problem in the server logs as: > {code} > [2016-08-03 16:14:07,795] WARN (org.eclipse.jetty.servlet.ServletHandler:620) > javax.servlet.ServletException: > org.glassfish.jersey.server.ContainerException: > com.fasterxml.jackson.databind.JsonMappingException: Unexpected character > ('=' (code 61)): was expecting a colon to separate field name and value > at [Source: > org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream@20fb9cff; > line: 1, column: 147] (through reference chain: > org.apache.kafka.connect.runtime.rest.entities.CreateConnectorRequest["config"]) > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489) > at > org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) >
[jira] [Commented] (KAFKA-3766) Unhandled "not enough replicas" errors in SyncGroup
[ https://issues.apache.org/jira/browse/KAFKA-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399520#comment-15399520 ] Dustin Cote commented on KAFKA-3766: Marking this as a dupe of KAFKA-3590. I've tried to incorporate suggestion #2 in my PR over there. > Unhandled "not enough replicas" errors in SyncGroup > --- > > Key: KAFKA-3766 > URL: https://issues.apache.org/jira/browse/KAFKA-3766 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > Caught by [~ijuma]. We seem to be missing at least a couple error codes when > handling the append log response when writing group metadata to the offsets > topic in the SyncGroup handler. In particular, we are missing checks for > NOT_ENOUGH_REPLICAS and NOT_ENOUGH_REPLICAS_AFTER_APPEND. Currently these > errors are returned directly in the SyncGroup response and cause an exception > to be raised to the user. > There are two options to fix this problem: > 1. We can continue to return these error codes in the sync group response and > add a handler on the client side to retry. > 2. We can convert the errors on the server to something like > COORDINATOR_NOT_AVAILABLE, which will cause the client to retry with the > existing logic. > The second option seems a little nicer to avoid exposing the internal > implementation of the SyncGroup request (i.e. that we write group metadata to > a partition). It also has the nice side effect of fixing old clients > automatically when the server is upgraded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling
[ https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399518#comment-15399518 ] Dustin Cote commented on KAFKA-3590: Thanks [~ijuma], I didn't see that one. Updated the PR to move toward Jason's recommendation. > KafkaConsumer fails with "Messages are rejected since there are fewer in-sync > replicas than required." when polling > --- > > Key: KAFKA-3590 > URL: https://issues.apache.org/jira/browse/KAFKA-3590 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: JDK1.8 Ubuntu 14.04 >Reporter: Sergey Alaev >Assignee: Dustin Cote > > KafkaConsumer.poll() fails with "Messages are rejected since there are fewer > in-sync replicas than required.". Isn't this message about minimum number of > ISR's when *sending* messages? > Stacktrace: > org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: > Messages are rejected since there are fewer in-sync replicas than required. > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > ~[kafka-clients-0.9.0.1.jar:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling
[ https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated KAFKA-3590: --- Status: Patch Available (was: Open) > KafkaConsumer fails with "Messages are rejected since there are fewer in-sync > replicas than required." when polling > --- > > Key: KAFKA-3590 > URL: https://issues.apache.org/jira/browse/KAFKA-3590 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: JDK1.8 Ubuntu 14.04 >Reporter: Sergey Alaev >Assignee: Dustin Cote > > KafkaConsumer.poll() fails with "Messages are rejected since there are fewer > in-sync replicas than required.". Isn't this message about minimum number of > ISR's when *sending* messages? > Stacktrace: > org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: > Messages are rejected since there are fewer in-sync replicas than required. > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > ~[kafka-clients-0.9.0.1.jar:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling
[ https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote reassigned KAFKA-3590: -- Assignee: Dustin Cote > KafkaConsumer fails with "Messages are rejected since there are fewer in-sync > replicas than required." when polling > --- > > Key: KAFKA-3590 > URL: https://issues.apache.org/jira/browse/KAFKA-3590 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: JDK1.8 Ubuntu 14.04 >Reporter: Sergey Alaev >Assignee: Dustin Cote > > KafkaConsumer.poll() fails with "Messages are rejected since there are fewer > in-sync replicas than required.". Isn't this message about minimum number of > ISR's when *sending* messages? > Stacktrace: > org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: > Messages are rejected since there are fewer in-sync replicas than required. > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > ~[kafka-clients-0.9.0.1.jar:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling
[ https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated KAFKA-3590: --- Component/s: (was: clients) consumer > KafkaConsumer fails with "Messages are rejected since there are fewer in-sync > replicas than required." when polling > --- > > Key: KAFKA-3590 > URL: https://issues.apache.org/jira/browse/KAFKA-3590 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: JDK1.8 Ubuntu 14.04 >Reporter: Sergey Alaev > > KafkaConsumer.poll() fails with "Messages are rejected since there are fewer > in-sync replicas than required.". Isn't this message about minimum number of > ISR's when *sending* messages? > Stacktrace: > org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: > Messages are rejected since there are fewer in-sync replicas than required. > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > ~[kafka-clients-0.9.0.1.jar:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling
[ https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396213#comment-15396213 ] Dustin Cote commented on KAFKA-3590: [~salaev] it looks like this is occurring in a SyncGroup call so it's probably due to the fact that the new consumer needs to write to the __consumer_offsets topic and can't because __consumer_offsets isn't meeting the min ISR requirements for the cluster. The error message isn't very clear, so maybe we can start by improving that. I can take a shot at the pull request in the next couple of days. > KafkaConsumer fails with "Messages are rejected since there are fewer in-sync > replicas than required." when polling > --- > > Key: KAFKA-3590 > URL: https://issues.apache.org/jira/browse/KAFKA-3590 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: JDK1.8 Ubuntu 14.04 >Reporter: Sergey Alaev >Assignee: Dustin Cote > > KafkaConsumer.poll() fails with "Messages are rejected since there are fewer > in-sync replicas than required.". Isn't this message about minimum number of > ISR's when *sending* messages? > Stacktrace: > org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: > Messages are rejected since there are fewer in-sync replicas than required. > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890) > ~[kafka-clients-0.9.0.1.jar:na] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > ~[kafka-clients-0.9.0.1.jar:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2932) Adjust importance level of Kafka Connect configs
[ https://issues.apache.org/jira/browse/KAFKA-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote reassigned KAFKA-2932: -- Assignee: Dustin Cote (was: Ewen Cheslack-Postava) > Adjust importance level of Kafka Connect configs > > > Key: KAFKA-2932 > URL: https://issues.apache.org/jira/browse/KAFKA-2932 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.9.0.0 >Reporter: Ewen Cheslack-Postava >Assignee: Dustin Cote > > Some of the configuration importance levels are out of whack, probably due to > the way they evolved over time. For example, the internal converter settings > are currently marked with high importance, but they are really an internal > implementation detail that the user usually shouldn't need to worry about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2932) Adjust importance level of Kafka Connect configs
[ https://issues.apache.org/jira/browse/KAFKA-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389722#comment-15389722 ] Dustin Cote commented on KAFKA-2932: [~ewencp] mind if I pick this one up? > Adjust importance level of Kafka Connect configs > > > Key: KAFKA-2932 > URL: https://issues.apache.org/jira/browse/KAFKA-2932 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.9.0.0 >Reporter: Ewen Cheslack-Postava >Assignee: Ewen Cheslack-Postava > > Some of the configuration importance levels are out of whack, probably due to > the way they evolved over time. For example, the internal converter settings > are currently marked with high importance, but they are really an internal > implementation detail that the user usually shouldn't need to worry about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389618#comment-15389618 ] Dustin Cote commented on KAFKA-2394: had to change the PR to come from a different branch so I could have my trunk branch back :) > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Dustin Cote >Priority: Minor > Labels: newbie > Fix For: 0.11.0.0 > > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327390#comment-15327390 ] Dustin Cote commented on KAFKA-2394: [~ewencp] yeah I think that's for the best. I wouldn't want to surprise anyone expecting the log naming scheme to stay the same with a change a minor version. If you think that's too conservative of an outlook, I'll defer to your judgement :) > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Dustin Cote >Priority: Minor > Labels: newbie > Fix For: 0.11.0.0 > > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325356#comment-15325356 ] Dustin Cote commented on KAFKA-2394: [~ewencp] makes sense, I went ahead and updated the docs in the PR. There wasn't a section on upgrading to 0.11 so I created one assuming that'll be the next non-fixpack to come out. Since this is a potential incompatible change, I've set this to 0.11 for a target. > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Dustin Cote >Priority: Minor > Labels: newbie > Fix For: 0.11.0.0 > > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated KAFKA-2394: --- Fix Version/s: 0.11.0.0 > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Dustin Cote >Priority: Minor > Labels: newbie > Fix For: 0.11.0.0 > > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3802) log mtimes reset on broker restart
[ https://issues.apache.org/jira/browse/KAFKA-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319233#comment-15319233 ] Dustin Cote commented on KAFKA-3802: I'm thinking that there's no partition reassignment happening here if it's a generic shutdown/startup. Can you provide any environmental details [~ottomata]? I think that would be helpful to pinpoint the problem. So far, I haven't been able to reproduce this issue, seemingly because I am missing something in the environment that others have. > log mtimes reset on broker restart > -- > > Key: KAFKA-3802 > URL: https://issues.apache.org/jira/browse/KAFKA-3802 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Andrew Otto > > Folks over in > http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAO8=cz0ragjad1acx4geqcwj+rkd1gmdavkjwytwthkszfg...@mail.gmail.com%3E > are commenting about this issue. > In 0.9, any data log file that was on > disk before the broker has it's mtime modified to the time of the broker > restart. > This causes problems with log retention, as all the files then look like > they contain recent data to kafka. We use the default log retention of 7 > days, but if all the files are touched at the same time, this can cause us > to retain up to 2 weeks of log data, which can fill up our disks. > This happens *most* of the time, but seemingly not all. We have seen broker > restarts where mtimes were not changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316992#comment-15316992 ] Dustin Cote edited comment on KAFKA-2394 at 6/6/16 6:57 PM: [~ewencp] I only got positive feedback on the change from the mailing list, so sounds like this is a desirable change? I think release notes for documentation would be sufficient based on that sentiment. was (Author: cotedm): [~ewencp] I only got positive feedback on the change from the mailing list, so sounds like this is a desirable change? > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Dustin Cote >Priority: Minor > Labels: newbie > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316992#comment-15316992 ] Dustin Cote commented on KAFKA-2394: [~ewencp] I only got positive feedback on the change from the mailing list, so sounds like this is a desirable change? > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Dustin Cote >Priority: Minor > Labels: newbie > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote reassigned KAFKA-2394: -- Assignee: Dustin Cote (was: jin xing) > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Dustin Cote >Priority: Minor > Labels: newbie > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312409#comment-15312409 ] Dustin Cote commented on KAFKA-2394: Thanks [~ewencp]. I went ahead and sent out to users@ and dev@ and hopefully the impacted users will speak up :) > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: jin xing >Priority: Minor > Labels: newbie > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302150#comment-15302150 ] Dustin Cote commented on KAFKA-2394: [~hachikuji] the attached PR removes references to the DailyRollingFileAppender because it's not just at risk when you have spamming logging, but over the long term, the DailyRollingFileAppender has no cleanup policy that I know of by default. It also has known issues with [losing log messages|https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/DailyRollingFileAppender.html]. I think it's good to scrub DailyRollingFileAppender completely and notify of the format change for logging or add the log4j extras to the dependency list to match the formatting. I prefer just changing the naming convention of the files but am certainly open to discussion. Thanks! > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: jin xing >Priority: Minor > Labels: newbie > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302098#comment-15302098 ] Dustin Cote commented on KAFKA-2394: [~jinxing6...@126.com] I'll go ahead and take this one to move it forward. Please let me know if you end up having time to work on it. Thanks! > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: jin xing >Priority: Minor > Labels: newbie > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-2971) KAFKA - Not obeying log4j settings, DailyRollingFileAppender not rolling files
[ https://issues.apache.org/jira/browse/KAFKA-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote resolved KAFKA-2971. Resolution: Resolved Resolving as superceded by KAFKA-2394. Thanks [~gwenshap]. > KAFKA - Not obeying log4j settings, DailyRollingFileAppender not rolling files > -- > > Key: KAFKA-2971 > URL: https://issues.apache.org/jira/browse/KAFKA-2971 > Project: Kafka > Issue Type: Bug > Components: config, log >Affects Versions: 0.8.2.2 > Environment: OS: Windows Server 2008 R2 Enterprise SP1 > log4j: 1.2.16 >Reporter: Damir Ban >Assignee: Jay Kreps > Fix For: 0.10.1.0 > > Attachments: log4j.properties > > > Per the settings in log4j it is expected that log files get rolled over > periodically, but they are just getting filled until restart of service when > they are overwritten. > As we have intermittent fatal failures customer restarts the service and we > loose the information about the failure. > We have tried different date paterns in the DailyRollingFileAppender, but no > change. > Attaching the log4j.properties -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2971) KAFKA - Not obeying log4j settings, DailyRollingFileAppender not rolling files
[ https://issues.apache.org/jira/browse/KAFKA-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300699#comment-15300699 ] Dustin Cote commented on KAFKA-2971: [~jkreps] and [~dban] I believe this issue is best addressed by moving to the much more stable RollingFileAppender as described in KAFKA-2394. Any objection to using the approach there and closing this one out? > KAFKA - Not obeying log4j settings, DailyRollingFileAppender not rolling files > -- > > Key: KAFKA-2971 > URL: https://issues.apache.org/jira/browse/KAFKA-2971 > Project: Kafka > Issue Type: Bug > Components: config, log >Affects Versions: 0.8.2.2 > Environment: OS: Windows Server 2008 R2 Enterprise SP1 > log4j: 1.2.16 >Reporter: Damir Ban >Assignee: Jay Kreps > Fix For: 0.10.1.0 > > Attachments: log4j.properties > > > Per the settings in log4j it is expected that log files get rolled over > periodically, but they are just getting filled until restart of service when > they are overwritten. > As we have intermittent fatal failures customer restarts the service and we > loose the information about the failure. > We have tried different date paterns in the DailyRollingFileAppender, but no > change. > Attaching the log4j.properties -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties
[ https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298296#comment-15298296 ] Dustin Cote commented on KAFKA-2394: [~jinxing6...@126.com] are you able to make a pull request for this issue instead of attaching the patch to the JIRA? I think you've already been doing [the process|https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes#ContributingCodeChanges-PullRequest] on other JIRAs, and this one seems like an easy one to translate for you. > Use RollingFileAppender by default in log4j.properties > -- > > Key: KAFKA-2394 > URL: https://issues.apache.org/jira/browse/KAFKA-2394 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: jin xing >Priority: Minor > Labels: newbie > Attachments: log4j.properties.patch > > > The default log4j.properties bundled with Kafka uses ConsoleAppender and > DailyRollingFileAppender, which offer no protection to users from spammy > logging. In extreme cases (such as when issues like KAFKA-1461 are > encountered), the logs can exhaust the local disk space. This could be a > problem for Kafka adoption since new users are less likely to adjust the > logging properties themselves, and are more likely to have configuration > problems which result in log spam. > To fix this, we can use RollingFileAppender, which offers two settings for > controlling the maximum space that log files will use. > maxBackupIndex: how many backup files to retain > maxFileSize: the max size of each log file > One question is whether this change is a compatibility concern? The backup > strategy and filenames used by RollingFileAppender are different from those > used by DailyRollingFileAppender, so any tools which depend on the old format > will break. If we think this is a serious problem, one solution would be to > provide two versions of log4j.properties and add a flag to enable the new > one. Another solution would be to include the RollingFileAppender > configuration in the default log4j.properties, but commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-3683) Add file descriptor recommendation to ops guide
[ https://issues.apache.org/jira/browse/KAFKA-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-3683 started by Dustin Cote. -- > Add file descriptor recommendation to ops guide > --- > > Key: KAFKA-3683 > URL: https://issues.apache.org/jira/browse/KAFKA-3683 > Project: Kafka > Issue Type: Improvement > Components: website >Reporter: Dustin Cote >Assignee: Dustin Cote >Priority: Trivial > > The Ops section of the documentation says that the file descriptor limits are > an important OS configuration to pay attention to but offer no guidance on > how to configure them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3666) Update new Consumer API links in the docs
[ https://issues.apache.org/jira/browse/KAFKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274155#comment-15274155 ] Dustin Cote commented on KAFKA-3666: Leaving the github link here since the PR didn't get pushed for some reason https://github.com/apache/kafka/pull/1331 > Update new Consumer API links in the docs > - > > Key: KAFKA-3666 > URL: https://issues.apache.org/jira/browse/KAFKA-3666 > Project: Kafka > Issue Type: Improvement > Components: website >Affects Versions: 0.9.0.0 >Reporter: Dustin Cote >Assignee: Dustin Cote >Priority: Trivial > > New Consumer API links in the docs point to the initial proposed design doc. > This should go to the real API docs now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3666) Update new Consumer API links in the docs
[ https://issues.apache.org/jira/browse/KAFKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated KAFKA-3666: --- Description: New Consumer API links in the docs point to the initial proposed design doc. This should go to the real API docs now. (was: The ConsumerOffsetChecker is deprecated in 0.9 but the docs still use it for the example of how to check offsets. We should replace the example with the ConsumerGroupCommand starting in the 0.9 docs and going forward. ) > Update new Consumer API links in the docs > - > > Key: KAFKA-3666 > URL: https://issues.apache.org/jira/browse/KAFKA-3666 > Project: Kafka > Issue Type: Improvement > Components: website >Affects Versions: 0.9.0.0 >Reporter: Dustin Cote >Assignee: Dustin Cote >Priority: Trivial > > New Consumer API links in the docs point to the initial proposed design doc. > This should go to the real API docs now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3666) Update offset checking example in the docs
[ https://issues.apache.org/jira/browse/KAFKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274090#comment-15274090 ] Dustin Cote commented on KAFKA-3666: After looking at the docs in trunk, it looks like the ConsumerGroupCommand is already documented by changes in KAFKA-1476. Updating the description here to remove the links to the new consumer API and update them with the links that point to the javadoc instead of the (deprecated it appears) design doc. > Update offset checking example in the docs > -- > > Key: KAFKA-3666 > URL: https://issues.apache.org/jira/browse/KAFKA-3666 > Project: Kafka > Issue Type: Improvement > Components: website >Affects Versions: 0.9.0.0 >Reporter: Dustin Cote >Assignee: Dustin Cote >Priority: Trivial > > The ConsumerOffsetChecker is deprecated in 0.9 but the docs still use it for > the example of how to check offsets. We should replace the example with the > ConsumerGroupCommand starting in the 0.9 docs and going forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3666) Update new Consumer API links in the docs
[ https://issues.apache.org/jira/browse/KAFKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote updated KAFKA-3666: --- Summary: Update new Consumer API links in the docs (was: Update offset checking example in the docs) > Update new Consumer API links in the docs > - > > Key: KAFKA-3666 > URL: https://issues.apache.org/jira/browse/KAFKA-3666 > Project: Kafka > Issue Type: Improvement > Components: website >Affects Versions: 0.9.0.0 >Reporter: Dustin Cote >Assignee: Dustin Cote >Priority: Trivial > > The ConsumerOffsetChecker is deprecated in 0.9 but the docs still use it for > the example of how to check offsets. We should replace the example with the > ConsumerGroupCommand starting in the 0.9 docs and going forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3666) Update offset checking example in the docs
Dustin Cote created KAFKA-3666: -- Summary: Update offset checking example in the docs Key: KAFKA-3666 URL: https://issues.apache.org/jira/browse/KAFKA-3666 Project: Kafka Issue Type: Improvement Components: website Affects Versions: 0.9.0.0 Reporter: Dustin Cote Assignee: Dustin Cote Priority: Trivial The ConsumerOffsetChecker is deprecated in 0.9 but the docs still use it for the example of how to check offsets. We should replace the example with the ConsumerGroupCommand starting in the 0.9 docs and going forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2670) add sampling rate to MirrorMaker
[ https://issues.apache.org/jira/browse/KAFKA-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204620#comment-15204620 ] Dustin Cote commented on KAFKA-2670: I agree with Joel, it's probably better to keep the MirrorMaker simple. Especially because the only way I can actually see to do this is to add another branch statement into the hot code path for processing events, which marginally slows the MirrorMaker. I'm going to checkout that interceptor API in some more detail and maybe it makes sense to add an example that would accomplish this same thing. For example, intercept every X number of events and /dev/null them instead of letting them make it to the target cluster. When I have a better handle on that I will make a pull request. > add sampling rate to MirrorMaker > > > Key: KAFKA-2670 > URL: https://issues.apache.org/jira/browse/KAFKA-2670 > Project: Kafka > Issue Type: Wish > Components: tools >Reporter: Christian Tramnitz >Priority: Minor > > MirrorMaker could be used to copy data to different Kafka instances in > different environments (i.e. from production to development or test), but > often these are not at the same scale as production. A sampling rate could be > introduced to MirrorMaker to define a ratio of data to copied (per topic) to > downstream instances. Of course this should be 1:1 (or 100%) per default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)