[jira] [Created] (KAFKA-6417) plugin.path pointing at a plugin directory causes ClassNotFoundException

2018-01-02 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-6417:
--

 Summary: plugin.path pointing at a plugin directory causes 
ClassNotFoundException
 Key: KAFKA-6417
 URL: https://issues.apache.org/jira/browse/KAFKA-6417
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 1.0.0
Reporter: Dustin Cote


When using the {{plugin.path}} configuration for the Connect workers, the user 
is expected to specify a list containing the following per the docs:

{quote}
The list should consist of top level directories that include any combination 
of: a) directories immediately containing jars with plugins and their 
dependencies b) uber-jars with plugins and their dependencies c) directories 
immediately containing the package directory structure of classes of plugins 
and their dependencies 
{quote}

This means we would expect {{plugin.path=/usr/share/plugins}} for a structure 
like {{/usr/share/plugins/myplugin1}},{{/usr/share/plugins/myplugin2}}, etc. 
However if you specify {{plugin.path=/usr/share/plugins/myplugin1}} the 
resulting behavior is that dependencies for {{myplugin1}} are not properly 
loaded. This causes a {{ClassNotFoundException}} that is not intuitive to 
debug. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2017-12-12 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote resolved KAFKA-2394.

Resolution: Won't Do

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Fix For: 1.1.0
>
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6107) SCRAM user add fails if Kafka has never been started

2017-10-23 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-6107:
--

 Summary: SCRAM user add fails if Kafka has never been started
 Key: KAFKA-6107
 URL: https://issues.apache.org/jira/browse/KAFKA-6107
 Project: Kafka
  Issue Type: Bug
  Components: tools, zkclient
Affects Versions: 0.11.0.0
Reporter: Dustin Cote
Priority: Minor


When trying to add a SCRAM user in ZooKeeper without having ever starting 
Kafka, the kafka-configs tool does not handle it well. This is a common use 
case because starting a new cluster where you want SCRAM for inter broker 
communication would generally result in seeing this problem. Today, the 
workaround is to start Kafka, add the user, then restart Kafka. Here's how to 
reproduce:

1) Start ZooKeeper
2) Run 
{code}
bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 
'SCRAM-SHA-256=[iterations=8192,password=broker_pwd],SCRAM-SHA-512=[password=broker_pwd]'
 --entity-type users --entity-name broker
{code}

This will result in:
{code}
bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 
'SCRAM-SHA-256=[iterations=8192,password=broker_pwd],SCRAM-SHA-512=[password=broker_pwd]'
 --entity-type users --entity-name broker
Error while executing config command 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /config/changes/config_change_
org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /config/changes/config_change_
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1001)
at org.I0Itec.zkclient.ZkClient.create(ZkClient.java:528)
at 
org.I0Itec.zkclient.ZkClient.createPersistentSequential(ZkClient.java:444)
at kafka.utils.ZkPath.createPersistentSequential(ZkUtils.scala:1045)
at kafka.utils.ZkUtils.createSequentialPersistentPath(ZkUtils.scala:527)
at 
kafka.admin.AdminUtils$.kafka$admin$AdminUtils$$changeEntityConfig(AdminUtils.scala:600)
at 
kafka.admin.AdminUtils$.changeUserOrUserClientIdConfig(AdminUtils.scala:551)
at kafka.admin.AdminUtilities$class.changeConfigs(AdminUtils.scala:63)
at kafka.admin.AdminUtils$.changeConfigs(AdminUtils.scala:72)
at kafka.admin.ConfigCommand$.alterConfig(ConfigCommand.scala:101)
at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:68)
at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /config/changes/config_change_
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.I0Itec.zkclient.ZkConnection.create(ZkConnection.java:100)
at org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:531)
at org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:528)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:991)
... 11 more
{code}

The command doesn't appear to fail but it does throw an exception and return an 
error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5994) Improve transparency of broker user ACL misconfigurations

2017-09-29 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5994:
--

 Summary: Improve transparency of broker user ACL misconfigurations
 Key: KAFKA-5994
 URL: https://issues.apache.org/jira/browse/KAFKA-5994
 Project: Kafka
  Issue Type: Improvement
  Components: security
Affects Versions: 0.10.2.1
Reporter: Dustin Cote


When the user for inter broker communication is not a super user and ACLs are 
configured with allow.everyone.if.no.acl.found=false, the cluster will not 
serve data. This is extremely confusing to debug because there is no security 
negotiation problem or indication of an error other than no data can make it in 
or out of the broker. If one knew to look in the authorizer log, it would be 
more clear, but that didn't make it into my workflow at least. Here's an 
example of a problematic debugging scenario

SASL_SSL, SSL, SASL_PLAINTEXT ports on the brokers
SASL user specified in `super.users`
SSL specified as the inter broker protocol

The only way I could figure out ACLs were an issue without gleaning it through 
configuration inspection was that controlled shutdown indicated that a cluster 
action had failed. 

It would be good if we could be more transparent about the failure here. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5885) NPE in ZKClient

2017-09-13 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5885:
--

 Summary: NPE in ZKClient
 Key: KAFKA-5885
 URL: https://issues.apache.org/jira/browse/KAFKA-5885
 Project: Kafka
  Issue Type: Bug
  Components: zkclient
Affects Versions: 0.10.2.1
Reporter: Dustin Cote


A null znode for a topic (reason how this happen isn't totally clear, but not 
the focus of this issue) can currently cause controller leader election to 
fail. When looking at the broker logging, you can see there is a 
NullPointerException emanating from the ZKClient:
{code}
[2017-09-11 00:00:21,441] ERROR Error while electing or becoming leader on 
broker 1010674 (kafka.server.ZookeeperLeaderElector)
kafka.common.KafkaException: Can't parse json string: null
at kafka.utils.Json$.liftedTree1$1(Json.scala:40)
at kafka.utils.Json$.parseFull(Json.scala:36)
at 
kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$1.apply(ZkUtils.scala:704)
at 
kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$1.apply(ZkUtils.scala:700)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.utils.ZkUtils.getReplicaAssignmentForTopics(ZkUtils.scala:700)
at 
kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:742)
at 
kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:333)
at 
kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:160)
at 
kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:85)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:154)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:154)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:154)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:153)
at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:825)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:72)
Caused by: java.lang.NullPointerException
{code}

Regardless of how a null topic znode ended up in ZooKeeper, we can probably 
handle this better, at least by printing the path up to the problematic znode 
in the log. The way this particular problem was resolved was by using the 
``kafka-topics`` command and seeing it persistently fail trying to read a 
particular topic with this same message. Then deleting the null znode allowed 
the leader election to complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5718) Better document what LogAppendTime means

2017-08-09 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5718:
--

 Summary: Better document what LogAppendTime means
 Key: KAFKA-5718
 URL: https://issues.apache.org/jira/browse/KAFKA-5718
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.11.0.0
Reporter: Dustin Cote
Priority: Trivial


There isn't a good description of LogAppendTime in the documentation. It would 
be nice to add this in somewhere to say something like:

LogAppendTime is some time between when the partition leader receives the 
request and before it writes it to it's local log. 

There are two important distinctions that trip people up:
1) This timestamp is not when the consumer could have first consumed the 
message. This instead requires min.insync.replicas to have been satisfied.
2) This is not precisely when the leader wrote to it's log, there can be delays 
along the path between receiving the request and writing to the log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5688) Add a modifier to the REST endpoint to only show errors

2017-08-01 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5688:
--

 Summary: Add a modifier to the REST endpoint to only show errors
 Key: KAFKA-5688
 URL: https://issues.apache.org/jira/browse/KAFKA-5688
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 0.11.0.0
Reporter: Dustin Cote
Priority: Minor


Today the REST endpoint for workers to validate configuration is pretty hard to 
read. It would be nice if we had a modifier for the /validate endpoint that 
only showed configuration errors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5675) Possible worker_id duplication in Connect

2017-07-28 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5675:
--

 Summary: Possible worker_id duplication in Connect
 Key: KAFKA-5675
 URL: https://issues.apache.org/jira/browse/KAFKA-5675
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.10.2.1
Reporter: Dustin Cote
Priority: Minor


It's possible to set non-unique host/port combinations for workers via 
*rest.advertised.host.name* and *rest.advertised.host.port* (e.g. 
localhost:8083). While this isn't typically advisable, it can result in weird 
behavior for containerized deployments where localhost might end up being 
mapped to something that is externally facing. The worker_id today appears to 
be set as this host/port combination so you end up with duplicate worker_ids 
causing long rebalances presumably because task assignment gets confused. It 
would be good to either change how the worker_id is generated or find a way to 
not let a worker start if a worker with an identical worker_id already exists. 
In the short term, we should document the requirement of unique advertised 
host/port combinations for workers to avoid debugging a somewhat tricky 
scenario.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-3056) MirrorMaker with new consumer doesn't handle CommitFailedException

2017-06-15 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote resolved KAFKA-3056.

Resolution: Duplicate

> MirrorMaker with new consumer doesn't handle CommitFailedException
> --
>
> Key: KAFKA-3056
> URL: https://issues.apache.org/jira/browse/KAFKA-3056
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>
> MirrorMaker currently doesn't handle CommitFailedException when trying to 
> commit (it only handles WakeupException).
> I didn't test,  but it appears that if MirrorMaker tries to commit while 
> rebalancing is in progress, it will result in thread failure.  Probably not 
> what we want. I think the right thing is to log a message and call poll() 
> again to trigger a re-join.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5327) Console Consumer should only poll for up to max messages

2017-05-25 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5327:
--

 Summary: Console Consumer should only poll for up to max messages
 Key: KAFKA-5327
 URL: https://issues.apache.org/jira/browse/KAFKA-5327
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Reporter: Dustin Cote
Priority: Minor


The ConsoleConsumer has a --max-messages flag that can be used to limit the 
number of messages consumed. However, the number of records actually consumed 
is governed by max.poll.records. This means you see one message on the console, 
but your offset has moved forward a default of 500, which is kind of 
counterintuitive. It would be good to only commit offsets for messages we have 
printed to the console.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5225) StreamsResetter doesn't allow custom Consumer properties

2017-05-12 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5225:
--

 Summary: StreamsResetter doesn't allow custom Consumer properties
 Key: KAFKA-5225
 URL: https://issues.apache.org/jira/browse/KAFKA-5225
 Project: Kafka
  Issue Type: Bug
  Components: streams, tools
Affects Versions: 0.10.2.1
Reporter: Dustin Cote


The StreamsResetter doesn't let the user pass in any configurations to the 
embedded consumer. This is a problem in secured environments because you can't 
configure the embedded consumer to talk to the cluster. The tool should take an 
approach similar to `kafka.admin.ConsumerGroupCommand` which allows a config 
file to be passed in the command line for such operations.

cc [~mjsax]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5148) Add configurable compression block size to the broker

2017-05-01 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5148:
--

 Summary: Add configurable compression block size to the broker
 Key: KAFKA-5148
 URL: https://issues.apache.org/jira/browse/KAFKA-5148
 Project: Kafka
  Issue Type: Improvement
  Components: compression
Reporter: Dustin Cote
 Fix For: 0.10.2.0


Similar to the discussion in KAFKA-3704, we should consider a configurable 
compression block size on the broker side. This especially considering the 
change in block size from 32KB to 1KB in the 0.10 release.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5138) MirrorMaker doesn't exit on send failure occasionally

2017-04-27 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5138:
--

 Summary: MirrorMaker doesn't exit on send failure occasionally
 Key: KAFKA-5138
 URL: https://issues.apache.org/jira/browse/KAFKA-5138
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.2.0
Reporter: Dustin Cote


MirrorMaker with abort.on.send.failure=true does not always exit if the 
producer closes. Here is the logic that happens:
First we encounter a problem producing and force the producer to close
{code}
[2017-04-10 07:17:25,137] ERROR Error when sending message to topic mytopicwith 
key: 20 bytes, value: 314 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 47 record(s) for 
mytopic-2: 30879 ms has passed since last append
[2017-04-10 07:17:25,170] INFO Closing producer due to send failure. 
(kafka.tools.MirrorMaker$)
[2017-04-10 07:17:25,170] INFO Closing the Kafka producer with timeoutMillis = 
0 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2017-04-10 07:17:25,170] INFO Proceeding to force close the producer since 
pending requests could not be completed within timeout 0 ms. 
(org.apache.kafka.clients.producer.KafkaProducer)
[2017-04-10 07:17:25,170] DEBUG The Kafka producer has closed. 
(org.apache.kafka.clients.producer.KafkaProducer)
[2017-04-10 07:17:25,170] ERROR Error when sending message to topic mytopic 
with key: 20 bytes, value: 313 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 47 record(s) for 
mytopic-2: 30879 ms has passed since last append
[2017-04-10 07:17:25,170] INFO Closing producer due to send failure. 
(kafka.tools.MirrorMaker$)
[2017-04-10 07:17:25,170] INFO Closing the Kafka producer with timeoutMillis = 
0 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2017-04-10 07:17:25,170] INFO Proceeding to force close the producer since 
pending requests could not be completed within timeout 0 ms. 
(org.apache.kafka.clients.producer.KafkaProducer)
[2017-04-10 07:17:25,170] DEBUG The Kafka producer has closed. 
(org.apache.kafka.clients.producer.KafkaProducer)
{code}

All good there. Then we can't seem to close the producer nicely after about 15 
seconds and so it is forcefully killed:
{code}
[2017-04-10 07:17:39,778] ERROR Error when sending message to topic 
mytopic.subscriptions with key: 70 bytes, value: null with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
java.lang.IllegalStateException: Producer is closed forcefully.
at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:522)
at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:502)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:147)
at java.lang.Thread.run(Unknown Source)
[2017-04-10 07:17:39,778] INFO Closing producer due to send failure. 
(kafka.tools.MirrorMaker$)
[2017-04-10 07:17:39,778] INFO Closing the Kafka producer with timeoutMillis = 
0 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2017-04-10 07:17:39,778] INFO Proceeding to force close the producer since 
pending requests could not be completed within timeout 0 ms. 
(org.apache.kafka.clients.producer.KafkaProducer)
[2017-04-10 07:17:39,778] DEBUG The Kafka producer has closed. 
(org.apache.kafka.clients.producer.KafkaProducer)
[2017-04-10 07:17:39,779] DEBUG Removed sensor with name connections-closed: 
(org.apache.kafka.common.metrics.Metrics)
{code}

After removing some metric sensors for awhile this happens:
{code}
[2017-04-10 07:17:39,780] DEBUG Removed sensor with name node-3.latency 
(org.apache.kafka.common.metrics.Metrics)
[2017-04-10 07:17:39,780] DEBUG Shutdown of Kafka producer I/O thread has 
completed. (org.apache.kafka.clients.producer.internals.Sender)
[2017-04-10 07:17:41,852] DEBUG Sending Heartbeat request for group 
mirror-maker-1491619052-teab1-1 to coordinator myhost1:9092 (id: 2147483643 
rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-04-10 07:17:41,953] DEBUG Received successful Heartbeat response for 
group mirror-maker-1491619052-teab1-1 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-04-10 07:17:44,875] DEBUG Sending Heartbeat request for group 
mirror-maker-1491619052-teab1-1 to coordinator myhost1:9092 (id: 2147483643 
rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
{code}

This heartbeating goes one for some time until:
{code}
[2017-04-10 07:19:57,392] DEBUG Received successful Heartbeat response for 
group mirror-maker-1491619052-teab1-1 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-04-10 07:19:57,994] DEBUG Connection with myhost1/123.123.321.321 
disconnected 

[jira] [Created] (KAFKA-5137) Controlled shutdown timeout message improvement

2017-04-27 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5137:
--

 Summary: Controlled shutdown timeout message improvement
 Key: KAFKA-5137
 URL: https://issues.apache.org/jira/browse/KAFKA-5137
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.10.2.0
Reporter: Dustin Cote
Priority: Minor


Currently if you fail during controlled shutdown, you can get a message that 
says the socket.timeout.ms has expired. This config actually doesn't exist on 
the broker. Instead, we should explicitly say if we've hit the 
controller.socket.timeout.ms or the request.timeout.ms as it's confusing to 
take action given the current message. I believe the relevant code is here:
https://github.com/apache/kafka/blob/0.10.2/core/src/main/scala/kafka/server/KafkaServer.scala#L428-L454

I'm also not sure if there's another timeout that could be hit here or another 
reason why IOException might be thrown. In the least we should call out those 
two configs instead of the non-existent one but if we can direct to the proper 
one that would be even better.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs

2017-04-24 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982151#comment-15982151
 ] 

Dustin Cote commented on KAFKA-5118:


[~huxi] ^

> Improve message for Kafka failed startup with non-Kafka data in data.dirs
> -
>
> Key: KAFKA-5118
> URL: https://issues.apache.org/jira/browse/KAFKA-5118
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.0
>Reporter: Dustin Cote
>Priority: Minor
>
> Today, if you try to startup a broker with some non-Kafka data in the 
> data.dirs you end up with a cryptic message:
> {code}
> [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads 
> during logs loading: java.lang.StringIndexOutOfBoundsException: String index 
> out of range: -1 (kafka.log.LogManager) 
> [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 
> {code}
> It'd be better if we could tell the user to look for non-Kafka data in the 
> data.dirs and print out the offending directory that caused the problem in 
> the first place.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs

2017-04-24 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982148#comment-15982148
 ] 

Dustin Cote commented on KAFKA-5118:


@huxi yes this is where this one came from. Directories that ended in -delete 
presumably from topics that were deleted. I hadn't checked if it can happen 
with other scenarios. Just thought the error was pretty cryptic.

> Improve message for Kafka failed startup with non-Kafka data in data.dirs
> -
>
> Key: KAFKA-5118
> URL: https://issues.apache.org/jira/browse/KAFKA-5118
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.0
>Reporter: Dustin Cote
>Priority: Minor
>
> Today, if you try to startup a broker with some non-Kafka data in the 
> data.dirs you end up with a cryptic message:
> {code}
> [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads 
> during logs loading: java.lang.StringIndexOutOfBoundsException: String index 
> out of range: -1 (kafka.log.LogManager) 
> [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 
> {code}
> It'd be better if we could tell the user to look for non-Kafka data in the 
> data.dirs and print out the offending directory that caused the problem in 
> the first place.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs

2017-04-24 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5118:
--

 Summary: Improve message for Kafka failed startup with non-Kafka 
data in data.dirs
 Key: KAFKA-5118
 URL: https://issues.apache.org/jira/browse/KAFKA-5118
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 0.10.2.0
Reporter: Dustin Cote
Priority: Minor


Today, if you try to startup a broker with some non-Kafka data in the data.dirs 
you end up with a cryptic message:
{code}
[2017-04-21 13:35:08,122] ERROR There was an error in one of the threads during 
logs loading: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1 (kafka.log.LogManager) 
[2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during 
KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) 
java.lang.StringIndexOutOfBoundsException: String index out of range: -1 
{code}

It'd be better if we could tell the user to look for non-Kafka data in the 
data.dirs and print out the offending directory that caused the problem in the 
first place.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5110) ConsumerGroupCommand error handling improvement

2017-04-24 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981086#comment-15981086
 ] 

Dustin Cote commented on KAFKA-5110:


[~vahid] I'd love to provide reproduction steps, but I can't seem to reproduce 
locally. This was seen on a cluster not under my control. If we could print 
some better info here as [~hachikuji]'s PR is doing, I think it would be 
instructive in seeing what the root cause may be.

> ConsumerGroupCommand error handling improvement
> ---
>
> Key: KAFKA-5110
> URL: https://issues.apache.org/jira/browse/KAFKA-5110
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.10.1.1
>Reporter: Dustin Cote
>Assignee: Jason Gustafson
>
> The ConsumerGroupCommand isn't handling partition errors properly. It throws 
> the following:
> {code}
> kafka-consumer-groups.sh --zookeeper 10.10.10.10:2181 --group mygroup 
> --describe
> GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
> Error while executing consumer group command empty.head
> java.lang.UnsupportedOperationException: empty.head
> at scala.collection.immutable.Vector.head(Vector.scala:193)
> at 
> kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$getLogEndOffset$1.apply(ConsumerGroupCommand.scala:197)
> at 
> kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$getLogEndOffset$1.apply(ConsumerGroupCommand.scala:194)
> at scala.Option.map(Option.scala:146)
> at 
> kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.getLogEndOffset(ConsumerGroupCommand.scala:194)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.kafka$admin$ConsumerGroupCommand$ConsumerGroupService$$describePartition(ConsumerGroupCommand.scala:125)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$$anonfun$describeTopicPartition$2.apply(ConsumerGroupCommand.scala:107)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$$anonfun$describeTopicPartition$2.apply(ConsumerGroupCommand.scala:106)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describeTopicPartition(ConsumerGroupCommand.scala:106)
> at 
> kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describeTopicPartition(ConsumerGroupCommand.scala:134)
> at 
> kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.kafka$admin$ConsumerGroupCommand$ZkConsumerGroupService$$describeTopic(ConsumerGroupCommand.scala:181)
> at 
> kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$describeGroup$1.apply(ConsumerGroupCommand.scala:166)
> at 
> kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$describeGroup$1.apply(ConsumerGroupCommand.scala:166)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at 
> kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:166)
> at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describe(ConsumerGroupCommand.scala:89)
> at 
> kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describe(ConsumerGroupCommand.scala:134)
> at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:68)
> at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5110) ConsumerGroupCommand error handling improvement

2017-04-21 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5110:
--

 Summary: ConsumerGroupCommand error handling improvement
 Key: KAFKA-5110
 URL: https://issues.apache.org/jira/browse/KAFKA-5110
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.10.1.1
Reporter: Dustin Cote
Assignee: Jason Gustafson


The ConsumerGroupCommand isn't handling partition errors properly. It throws 
the following:

{code}
kafka-consumer-groups.sh --zookeeper 10.10.10.10:2181 --group mygroup --describe

GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER

Error while executing consumer group command empty.head

java.lang.UnsupportedOperationException: empty.head

at scala.collection.immutable.Vector.head(Vector.scala:193)

at 
kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$getLogEndOffset$1.apply(ConsumerGroupCommand.scala:197)

at 
kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$getLogEndOffset$1.apply(ConsumerGroupCommand.scala:194)

at scala.Option.map(Option.scala:146)

at 
kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.getLogEndOffset(ConsumerGroupCommand.scala:194)

at 
kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.kafka$admin$ConsumerGroupCommand$ConsumerGroupService$$describePartition(ConsumerGroupCommand.scala:125)

at 
kafka.admin.ConsumerGroupCommand$ConsumerGroupService$$anonfun$describeTopicPartition$2.apply(ConsumerGroupCommand.scala:107)

at 
kafka.admin.ConsumerGroupCommand$ConsumerGroupService$$anonfun$describeTopicPartition$2.apply(ConsumerGroupCommand.scala:106)

at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

at 
kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describeTopicPartition(ConsumerGroupCommand.scala:106)

at 
kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describeTopicPartition(ConsumerGroupCommand.scala:134)

at 
kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.kafka$admin$ConsumerGroupCommand$ZkConsumerGroupService$$describeTopic(ConsumerGroupCommand.scala:181)

at 
kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$describeGroup$1.apply(ConsumerGroupCommand.scala:166)

at 
kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService$$anonfun$describeGroup$1.apply(ConsumerGroupCommand.scala:166)

at scala.collection.Iterator$class.foreach(Iterator.scala:893)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)

at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)

at scala.collection.AbstractIterable.foreach(Iterable.scala:54)

at 
kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:166)

at 
kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describe(ConsumerGroupCommand.scala:89)

at 
kafka.admin.ConsumerGroupCommand$ZkConsumerGroupService.describe(ConsumerGroupCommand.scala:134)

at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:68)

at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5074) Transition to OnlinePartition without preferred leader in ISR fails

2017-04-14 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969426#comment-15969426
 ] 

Dustin Cote commented on KAFKA-5074:


Ah ok I got what you meant. The OfflinePartitionLeaderSelector is invoked for 
other partitions, but not the ones that stay offline. For example, a different 
partition:
{code}
[2017-04-12 16:27:34,270] INFO [OfflinePartitionLeaderSelector]: Selected new 
leader and ISR {"leader":9,"leader_epoch":2,"isr":[9,11]} for offline partition 
[topic2,28] (kafka.controller.OfflinePartitionLeaderSelector)
{code}

Also not seeing any entries like "Error while moving some partitions to the 
online state" in the controller logs.

> Transition to OnlinePartition without preferred leader in ISR fails
> ---
>
> Key: KAFKA-5074
> URL: https://issues.apache.org/jira/browse/KAFKA-5074
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.0
>Reporter: Dustin Cote
>
> Running 0.9.0.0, the controller can get into a state where it no longer is 
> able to elect a leader for an Offline partition. It's unclear how this state 
> is first achieved but in the steady state, this happens:
> -There are partitions with a leader of -1
> -The Controller repeatedly attempts a preferred leader election for these 
> partitions
> -The preferred leader election fails because the only replica in the ISR is 
> not the preferred leader
> The log cycle looks like this:
> {code}
> [2017-04-12 18:00:18,891] INFO [Controller 8]: Starting preferred replica 
> leader election for partitions topic,1
> [2017-04-12 18:00:18,891] INFO [Partition state machine on Controller 8]: 
> Invoking state change to OnlinePartition for partitions topic,1
> [2017-04-12 18:00:18,892] INFO [PreferredReplicaPartitionLeaderSelector]: 
> Current leader -1 for partition [topic,1] is not the preferred replica. 
> Trigerring preferred replica leader election 
> (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> [2017-04-12 18:00:18,893] WARN [Controller 8]: Partition [topic,1] failed to 
> complete preferred replica leader election. Leader is -1 
> (kafka.controller.KafkaController)
> {code}
> It's not clear if this would happen on versions later that 0.9.0.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5074) Transition to OnlinePartition without preferred leader in ISR fails

2017-04-14 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969406#comment-15969406
 ] 

Dustin Cote commented on KAFKA-5074:


[~ijuma] I think the unexpected behavior here is that the leader is not 
selected from the ISR. Instead, the cycle of preferred leader election --> 
attempt transition to online --> failed preferred leader election --> preferred 
leader election repeats every 5 minutes until the controller is restarted.

> Transition to OnlinePartition without preferred leader in ISR fails
> ---
>
> Key: KAFKA-5074
> URL: https://issues.apache.org/jira/browse/KAFKA-5074
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.0
>Reporter: Dustin Cote
>
> Running 0.9.0.0, the controller can get into a state where it no longer is 
> able to elect a leader for an Offline partition. It's unclear how this state 
> is first achieved but in the steady state, this happens:
> -There are partitions with a leader of -1
> -The Controller repeatedly attempts a preferred leader election for these 
> partitions
> -The preferred leader election fails because the only replica in the ISR is 
> not the preferred leader
> The log cycle looks like this:
> {code}
> [2017-04-12 18:00:18,891] INFO [Controller 8]: Starting preferred replica 
> leader election for partitions topic,1
> [2017-04-12 18:00:18,891] INFO [Partition state machine on Controller 8]: 
> Invoking state change to OnlinePartition for partitions topic,1
> [2017-04-12 18:00:18,892] INFO [PreferredReplicaPartitionLeaderSelector]: 
> Current leader -1 for partition [topic,1] is not the preferred replica. 
> Trigerring preferred replica leader election 
> (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> [2017-04-12 18:00:18,893] WARN [Controller 8]: Partition [topic,1] failed to 
> complete preferred replica leader election. Leader is -1 
> (kafka.controller.KafkaController)
> {code}
> It's not clear if this would happen on versions later that 0.9.0.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5074) Transition to OnlinePartition without preferred leader in ISR fails

2017-04-14 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-5074:
--

 Summary: Transition to OnlinePartition without preferred leader in 
ISR fails
 Key: KAFKA-5074
 URL: https://issues.apache.org/jira/browse/KAFKA-5074
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 0.9.0.0
Reporter: Dustin Cote


Running 0.9.0.0, the controller can get into a state where it no longer is able 
to elect a leader for an Offline partition. It's unclear how this state is 
first achieved but in the steady state, this happens:
-There are partitions with a leader of -1
-The Controller repeatedly attempts a preferred leader election for these 
partitions
-The preferred leader election fails because the only replica in the ISR is not 
the preferred leader

The log cycle looks like this:
{code}
[2017-04-12 18:00:18,891] INFO [Controller 8]: Starting preferred replica 
leader election for partitions topic,1
[2017-04-12 18:00:18,891] INFO [Partition state machine on Controller 8]: 
Invoking state change to OnlinePartition for partitions topic,1
[2017-04-12 18:00:18,892] INFO [PreferredReplicaPartitionLeaderSelector]: 
Current leader -1 for partition [topic,1] is not the preferred replica. 
Trigerring preferred replica leader election 
(kafka.controller.PreferredReplicaPartitionLeaderSelector)
[2017-04-12 18:00:18,893] WARN [Controller 8]: Partition [topic,1] failed to 
complete preferred replica leader election. Leader is -1 
(kafka.controller.KafkaController)
{code}

It's not clear if this would happen on versions later that 0.9.0.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-3955) Kafka log recovery doesn't truncate logs on non-monotonic offsets, leading to failed broker boot

2017-04-07 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960854#comment-15960854
 ] 

Dustin Cote commented on KAFKA-3955:


Adding 0.10.1.1 as this kind of behavior has been observed on that release as 
well.

> Kafka log recovery doesn't truncate logs on non-monotonic offsets, leading to 
> failed broker boot
> 
>
> Key: KAFKA-3955
> URL: https://issues.apache.org/jira/browse/KAFKA-3955
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1, 0.8.2.0, 0.8.2.1, 0.8.2.2, 
> 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.1.1
>Reporter: Tom Crayford
>
> Hi,
> I've found a bug impacting kafka brokers on startup after an unclean 
> shutdown. If a log segment is corrupt and has non-monotonic offsets (see the 
> appendix of this bug for a sample output from {{DumpLogSegments}}), then 
> {{LogSegment.recover}} throws an {{InvalidOffsetException}} error: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/OffsetIndex.scala#L218
> That code is called by {{LogSegment.recover}}: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogSegment.scala#L191
> Which is called in several places in {{Log.scala}}. Notably it's called four 
> times during recovery:
> Thrice in Log.loadSegments
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L199
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L204
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L226
> and once in Log.recoverLog
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L268
> Of these, only the very last one has a {{catch}} for 
> {{InvalidOffsetException}}. When that catches the issue, it truncates the 
> whole log (not just this segment): 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L274
>  to the start segment of the bad log segment.
> However, this code can't be hit on recovery, because of the code paths in 
> {{loadSegments}} - they mean we'll never hit truncation here, as we always 
> throw this exception and that goes all the way to the toplevel exception 
> handler and crashes the JVM.
> As {{Log.recoverLog}} is always called during recovery, I *think* a fix for 
> this is to move this crash recovery/truncate code inside a new method in 
> {{Log.scala}}, and call that instead of {{LogSegment.recover}} in each place. 
> That code should return the number of {{truncatedBytes}} like we do in 
> {{Log.recoverLog}} and then truncate the log. The callers will have to be 
> notified "stop iterating over files in the directory", likely via a return 
> value of {{truncatedBytes}} like {{Log.recoverLog` does right now.
> I'm happy working on a patch for this. I'm aware this recovery code is tricky 
> and important to get right.
> I'm also curious (and currently don't have good theories as of yet) as to how 
> this log segment got into this state with non-monotonic offsets. This segment 
> is using gzip compression, and is under 0.9.0.1. The same bug with respect to 
> recovery exists in trunk, but I'm unsure if the new handling around 
> compressed messages (KIP-31) means the bug where non-monotonic offsets get 
> appended is still present in trunk.
> As a production workaround, one can manually truncate that log folder 
> yourself (delete all .index/.log files including and after the one with the 
> bad offset). However, kafka should (and can) handle this case well - with 
> replication we can truncate in broker startup.
> stacktrace and error message:
> {code}
> pri=WARN  t=pool-3-thread-4 at=Log Found a corrupted index file, 
> /$DIRECTORY/$TOPIC-22/14306536.index, deleting and rebuilding 
> index...
> pri=ERROR t=main at=LogManager There was an error in one of the threads 
> during logs loading: kafka.common.InvalidOffsetException: Attempt to append 
> an offset (15000337) to position 111719 no larger than the last offset 
> appended (15000337) to /$DIRECTORY/$TOPIC-8/14008931.index.
> pri=FATAL t=main at=KafkaServer Fatal error during KafkaServer startup. 
> Prepare to shutdown kafka.common.InvalidOffsetException: Attempt to append an 
> offset (15000337) to position 111719 no larger than the last offset appended 
> (15000337) to /$DIRECTORY/$TOPIC-8/14008931.index.
> at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
> at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> at 

[jira] [Updated] (KAFKA-3955) Kafka log recovery doesn't truncate logs on non-monotonic offsets, leading to failed broker boot

2017-04-07 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated KAFKA-3955:
---
Affects Version/s: 0.10.1.1

> Kafka log recovery doesn't truncate logs on non-monotonic offsets, leading to 
> failed broker boot
> 
>
> Key: KAFKA-3955
> URL: https://issues.apache.org/jira/browse/KAFKA-3955
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1, 0.8.2.0, 0.8.2.1, 0.8.2.2, 
> 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.1.1
>Reporter: Tom Crayford
>
> Hi,
> I've found a bug impacting kafka brokers on startup after an unclean 
> shutdown. If a log segment is corrupt and has non-monotonic offsets (see the 
> appendix of this bug for a sample output from {{DumpLogSegments}}), then 
> {{LogSegment.recover}} throws an {{InvalidOffsetException}} error: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/OffsetIndex.scala#L218
> That code is called by {{LogSegment.recover}}: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogSegment.scala#L191
> Which is called in several places in {{Log.scala}}. Notably it's called four 
> times during recovery:
> Thrice in Log.loadSegments
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L199
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L204
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L226
> and once in Log.recoverLog
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L268
> Of these, only the very last one has a {{catch}} for 
> {{InvalidOffsetException}}. When that catches the issue, it truncates the 
> whole log (not just this segment): 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L274
>  to the start segment of the bad log segment.
> However, this code can't be hit on recovery, because of the code paths in 
> {{loadSegments}} - they mean we'll never hit truncation here, as we always 
> throw this exception and that goes all the way to the toplevel exception 
> handler and crashes the JVM.
> As {{Log.recoverLog}} is always called during recovery, I *think* a fix for 
> this is to move this crash recovery/truncate code inside a new method in 
> {{Log.scala}}, and call that instead of {{LogSegment.recover}} in each place. 
> That code should return the number of {{truncatedBytes}} like we do in 
> {{Log.recoverLog}} and then truncate the log. The callers will have to be 
> notified "stop iterating over files in the directory", likely via a return 
> value of {{truncatedBytes}} like {{Log.recoverLog` does right now.
> I'm happy working on a patch for this. I'm aware this recovery code is tricky 
> and important to get right.
> I'm also curious (and currently don't have good theories as of yet) as to how 
> this log segment got into this state with non-monotonic offsets. This segment 
> is using gzip compression, and is under 0.9.0.1. The same bug with respect to 
> recovery exists in trunk, but I'm unsure if the new handling around 
> compressed messages (KIP-31) means the bug where non-monotonic offsets get 
> appended is still present in trunk.
> As a production workaround, one can manually truncate that log folder 
> yourself (delete all .index/.log files including and after the one with the 
> bad offset). However, kafka should (and can) handle this case well - with 
> replication we can truncate in broker startup.
> stacktrace and error message:
> {code}
> pri=WARN  t=pool-3-thread-4 at=Log Found a corrupted index file, 
> /$DIRECTORY/$TOPIC-22/14306536.index, deleting and rebuilding 
> index...
> pri=ERROR t=main at=LogManager There was an error in one of the threads 
> during logs loading: kafka.common.InvalidOffsetException: Attempt to append 
> an offset (15000337) to position 111719 no larger than the last offset 
> appended (15000337) to /$DIRECTORY/$TOPIC-8/14008931.index.
> pri=FATAL t=main at=KafkaServer Fatal error during KafkaServer startup. 
> Prepare to shutdown kafka.common.InvalidOffsetException: Attempt to append an 
> offset (15000337) to position 111719 no larger than the last offset appended 
> (15000337) to /$DIRECTORY/$TOPIC-8/14008931.index.
> at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
> at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
> at kafka.log.LogSegment.recover(LogSegment.scala:188)
>  

[jira] [Created] (KAFKA-4861) log.message.timestamp.type=LogAppendTime breaks Kafka based consumers

2017-03-07 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-4861:
--

 Summary: log.message.timestamp.type=LogAppendTime breaks Kafka 
based consumers
 Key: KAFKA-4861
 URL: https://issues.apache.org/jira/browse/KAFKA-4861
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.10.2.0
Reporter: Dustin Cote
Assignee: Jason Gustafson
Priority: Blocker


Using 0.10.2 brokers with the property 
`log.message.timestamp.type=LogAppendTime` breaks all Kafka-based consumers for 
the cluster. The consumer will return:
{code}
[2017-03-07 15:25:10,215] ERROR Unknown error when running consumer:  
(kafka.tools.ConsoleConsumer$)
org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: The 
timestamp of the message is out of acceptable range.
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:535)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:508)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:764)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:745)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:186)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:149)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:116)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:493)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:322)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:253)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:334)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
at kafka.consumer.NewShinyConsumer.(BaseConsumer.scala:55)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
{code}

On the broker side you see:
{code}
[2017-03-07 15:25:20,216] INFO [GroupCoordinator 0]: Group 
console-consumer-73205 with generation 2 is now empty 
(kafka.coordinator.GroupCoordinator)
[2017-03-07 15:25:20,217] ERROR [Group Metadata Manager on Broker 0]: Appending 
metadata message for group console-consumer-73205 generation 2 failed due to 
unexpected error: org.apache.kafka.common.errors.InvalidTimestampException 
(kafka.coordinator.GroupMetadataManager)
[2017-03-07 15:25:20,218] WARN [GroupCoordinator 0]: Failed to write empty 
metadata for group console-consumer-73205: The timestamp of the message is out 
of acceptable range. (kafka.coordinator.GroupCoordinator)
{code}

Marking as a blocker since this appears to be a regression in that it doesn't 
happen on 0.10.1.1



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4618) Enable clients to "re-bootstrap" in the event of a full cluster migration

2017-01-11 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-4618:
--

 Summary: Enable clients to "re-bootstrap" in the event of a full 
cluster migration
 Key: KAFKA-4618
 URL: https://issues.apache.org/jira/browse/KAFKA-4618
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 0.10.1.1
Reporter: Dustin Cote
Priority: Minor


Today, clients only bootstrap upon startup. This works well in most cases, but 
following up on the discussion 
[here|https://issues.apache.org/jira/browse/KAFKA-3068?focusedCommentId=15115702=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15115702]
 there are scenarios where you may want to "re-bootstrap" a client on the fly. 
One example scenario is if you have an active-passive failover cluster setup 
and a load balancer at the front managing. If you lose all of the brokers in 
the active cluster and need to immediately failover to the other set of 
brokers, you would need to re-bootstrap, meaning clients would have to restart. 
Having the ability to go back through the bootstrapping process with a restart 
would be a nice improvement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2017-01-03 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796240#comment-15796240
 ] 

Dustin Cote commented on KAFKA-2394:


[~gquintana] I agree it would be better, but the existing Kafka dependencies 
only include Log4J v1 without extras and I'd rather not muddy the waters here 
by updating dependencies too. There's another JIRA with the goal of updating 
Log4J that will hopefully render this one unnecessary if we can get that one 
into Kafka before the next big release. That's KAFKA-1368. It'd be good to get 
your input on that JIRA if you are looking to see the out of the box logging 
updated.

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Fix For: 0.11.0.0
>
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.

2016-12-08 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733442#comment-15733442
 ] 

Dustin Cote commented on KAFKA-4477:


It may also be helpful to grab the associated zookeeper logs. If the bad server 
is losing contact with zookeeper at the same time that it loses connection with 
the other brokers, things can end up in a weird state.

> Node reduces its ISR to itself, and doesn't recover. Other nodes do not take 
> leadership, cluster remains sick until node is restarted.
> --
>
> Key: KAFKA-4477
> URL: https://issues.apache.org/jira/browse/KAFKA-4477
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0
> Environment: RHEL7
> java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
>Reporter: Michael Andre Pearce (IG)
>Assignee: Apurva Mehta
>Priority: Critical
>  Labels: reliability
> Attachments: kafka.jstack
>
>
> We have encountered a critical issue that has re-occured in different 
> physical environments. We haven't worked out what is going on. We do though 
> have a nasty work around to keep service alive. 
> We do have not had this issue on clusters still running 0.9.01.
> We have noticed a node randomly shrinking for the partitions it owns the 
> ISR's down to itself, moments later we see other nodes having disconnects, 
> followed by finally app issues, where producing to these partitions is 
> blocked.
> It seems only by restarting the kafka instance java process resolves the 
> issues.
> We have had this occur multiple times and from all network and machine 
> monitoring the machine never left the network, or had any other glitches.
> Below are seen logs from the issue.
> Node 7:
> [2016-12-01 07:01:28,112] INFO Partition 
> [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking 
> ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from 
> 1,2,7 to 7 (kafka.cluster.Partition)
> All other nodes:
> [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch 
> kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 7 was disconnected before the response was 
> read
> All clients:
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> After this occurs, we then suddenly see on the sick machine an increasing 
> amount of close_waits and file descriptors.
> As a work around to keep service we are currently putting in an automated 
> process that tails and regex's for: and where new_partitions hit just itself 
> we restart the node. 
> "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for 
> partition \[.*\] from (?P.+) to (?P.+) 
> \(kafka.cluster.Partition\)"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-4351) Topic regex behavioral change with MirrorMaker new consumer

2016-10-28 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote reopened KAFKA-4351:


Looks like this got marked resolved accidentally when the PR was made.  
Reopening but cc'ing [~huxi_2b].  Thanks for picking it up.

> Topic regex behavioral change with MirrorMaker new consumer
> ---
>
> Key: KAFKA-4351
> URL: https://issues.apache.org/jira/browse/KAFKA-4351
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.9.0.1
>Reporter: Dustin Cote
>Assignee: huxi
>Priority: Minor
>
> There is a behavioral change when you use MirrorMaker with the new consumer 
> versus the old consumer with respect to regex parsing of the input topic 
> list.  If one combines a regex with a comma separated list for the topic list 
> with old consumer implementation of MirrorMaker, it works ok.  Example:
> {code}
> --whitelist '.+(my|mine),topic,anothertopic'
> {code}
> If you pass this in with the new consumer implementation, no topics are 
> found.  The workaround is to use something like:
> {code}
> --whitelist '.+(my|mine)|topic|anothertopic'
> {code}
> We should make an effort to be consistent between the two implementations as 
> it's a bit surprising when migrating. I've verified the problem exists on 
> 0.9.0.1 but I don't see at first any fix to the problem in later versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4351) Topic regex behavioral change with MirrorMaker new consumer

2016-10-27 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-4351:
--

 Summary: Topic regex behavioral change with MirrorMaker new 
consumer
 Key: KAFKA-4351
 URL: https://issues.apache.org/jira/browse/KAFKA-4351
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.9.0.1
Reporter: Dustin Cote
Priority: Minor


There is a behavioral change when you use MirrorMaker with the new consumer 
versus the old consumer with respect to regex parsing of the input topic list.  
If one combines a regex with a comma separated list for the topic list with old 
consumer implementation of MirrorMaker, it works ok.  Example:
{code}
--whitelist '.+(my|mine),topic,anothertopic'
{code}
If you pass this in with the new consumer implementation, no topics are found.  
The workaround is to use something like:
{code}
--whitelist '.+(my|mine)|topic|anothertopic'
{code}

We should make an effort to be consistent between the two implementations as 
it's a bit surprising when migrating. I've verified the problem exists on 
0.9.0.1 but I don't see at first any fix to the problem in later versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4320) Log compaction docs update

2016-10-20 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592128#comment-15592128
 ] 

Dustin Cote commented on KAFKA-4320:


[~Aurijoy] great, welcome! First have a look at the how to contribute guide 
[http://kafka.apache.org/contributing].  Then once you've gotten set up, you 
can make a pull request on the github for kafka 
[https://github.com/apache/kafka].

It looks like this doesn't affect the docs in the trunk branch, but it should 
be fixed for 0.9 and 0.10 branches.  Have a look here: 
[https://github.com/apache/kafka/blob/0.10.0/docs/design.html#L343].  Doc fixes 
are a great way to get involved, so feel free to fix any typos you might see as 
well.  

> Log compaction docs update
> --
>
> Key: KAFKA-4320
> URL: https://issues.apache.org/jira/browse/KAFKA-4320
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Reporter: Dustin Cote
>Priority: Minor
>  Labels: newbie
>
> The log compaction docs are out of date.  At least the default is said to be 
> that log compaction is disabled which is not true as of 0.9.0.1.  Probably 
> the whole section needs a once over to make sure it's in line with what is 
> currently there.  This is the section:
> [http://kafka.apache.org/documentation#design_compactionconfig]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4320) Log compaction docs update

2016-10-19 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated KAFKA-4320:
---
Labels: newbie  (was: )

> Log compaction docs update
> --
>
> Key: KAFKA-4320
> URL: https://issues.apache.org/jira/browse/KAFKA-4320
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Reporter: Dustin Cote
>Priority: Minor
>  Labels: newbie
>
> The log compaction docs are out of date.  At least the default is said to be 
> that log compaction is disabled which is not true as of 0.9.0.1.  Probably 
> the whole section needs a once over to make sure it's in line with what is 
> currently there.  This is the section:
> [http://kafka.apache.org/documentation#design_compactionconfig]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4320) Log compaction docs update

2016-10-19 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-4320:
--

 Summary: Log compaction docs update
 Key: KAFKA-4320
 URL: https://issues.apache.org/jira/browse/KAFKA-4320
 Project: Kafka
  Issue Type: Bug
  Components: documentation
Reporter: Dustin Cote
Priority: Minor


The log compaction docs are out of date.  At least the default is said to be 
that log compaction is disabled which is not true as of 0.9.0.1.  Probably the 
whole section needs a once over to make sure it's in line with what is 
currently there.  This is the section:
[http://kafka.apache.org/documentation#design_compactionconfig]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4207) Partitions stopped after a rapid restart of a broker

2016-09-23 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517727#comment-15517727
 ] 

Dustin Cote commented on KAFKA-4207:


Agreed, marking this one as a duplicate

> Partitions stopped after a rapid restart of a broker
> 
>
> Key: KAFKA-4207
> URL: https://issues.apache.org/jira/browse/KAFKA-4207
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1, 0.10.0.1
>Reporter: Dustin Cote
>
> Environment:
> 4 Kafka brokers
> 10,000 topics with one partition each, replication factor 3
> Partitions with 4KB data each
> No data being produced or consumed
> Scenario:
> Initiate controlled shutdown on one broker
> Interrupt controlled shutdown prior completion with a SIGKILL
> Start a new broker with the same broker ID as broker that was just killed 
> immediately
> Symptoms:
> After starting the new broker, the other three brokers in the cluster will 
> see under replicated partitions forever for some partitions that are hosted 
> on the broker that was killed and restarted
> Cause:
> Today, the controller sends a StopReplica command for each replica hosted on 
> a broker that has initiated a controlled shutdown.  For a large number of 
> replicas this can take awhile.  When the broker that is doing the controlled 
> shutdown is killed, the StopReplica commands are queued up even though the 
> request queue to the broker is cleared.  When the broker comes back online, 
> the StopReplica commands that were queued, get sent to the broker that just 
> started up.  
> CC: [~junrao] since he's familiar with the scenario seen here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4207) Partitions stopped after a rapid restart of a broker

2016-09-23 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote resolved KAFKA-4207.

Resolution: Duplicate

> Partitions stopped after a rapid restart of a broker
> 
>
> Key: KAFKA-4207
> URL: https://issues.apache.org/jira/browse/KAFKA-4207
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1, 0.10.0.1
>Reporter: Dustin Cote
>
> Environment:
> 4 Kafka brokers
> 10,000 topics with one partition each, replication factor 3
> Partitions with 4KB data each
> No data being produced or consumed
> Scenario:
> Initiate controlled shutdown on one broker
> Interrupt controlled shutdown prior completion with a SIGKILL
> Start a new broker with the same broker ID as broker that was just killed 
> immediately
> Symptoms:
> After starting the new broker, the other three brokers in the cluster will 
> see under replicated partitions forever for some partitions that are hosted 
> on the broker that was killed and restarted
> Cause:
> Today, the controller sends a StopReplica command for each replica hosted on 
> a broker that has initiated a controlled shutdown.  For a large number of 
> replicas this can take awhile.  When the broker that is doing the controlled 
> shutdown is killed, the StopReplica commands are queued up even though the 
> request queue to the broker is cleared.  When the broker comes back online, 
> the StopReplica commands that were queued, get sent to the broker that just 
> started up.  
> CC: [~junrao] since he's familiar with the scenario seen here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4207) Partitions stopped after a rapid restart of a broker

2016-09-22 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-4207:
--

 Summary: Partitions stopped after a rapid restart of a broker
 Key: KAFKA-4207
 URL: https://issues.apache.org/jira/browse/KAFKA-4207
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 0.10.0.1, 0.9.0.1
Reporter: Dustin Cote


Environment:
4 Kafka brokers
10,000 topics with one partition each, replication factor 3
Partitions with 4KB data each
No data being produced or consumed

Scenario:
Initiate controlled shutdown on one broker
Interrupt controlled shutdown prior completion with a SIGKILL
Start a new broker with the same broker ID as broker that was just killed 
immediately

Symptoms:
After starting the new broker, the other three brokers in the cluster will see 
under replicated partitions forever for some partitions that are hosted on the 
broker that was killed and restarted

Cause:
Today, the controller sends a StopReplica command for each replica hosted on a 
broker that has initiated a controlled shutdown.  For a large number of 
replicas this can take awhile.  When the broker that is doing the controlled 
shutdown is killed, the StopReplica commands are queued up even though the 
request queue to the broker is cleared.  When the broker comes back online, the 
StopReplica commands that were queued, get sent to the broker that just started 
up.  

CC: [~junrao] since he's familiar with the scenario seen here




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4169) Calculation of message size is too conservative for compressed messages

2016-09-14 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490387#comment-15490387
 ] 

Dustin Cote commented on KAFKA-4169:


Ah, good point.  The user reporting this issue suggested pushing the check into 
the RecordAccumulator.  That might be a better option.

> Calculation of message size is too conservative for compressed messages
> ---
>
> Key: KAFKA-4169
> URL: https://issues.apache.org/jira/browse/KAFKA-4169
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.0.1
>Reporter: Dustin Cote
>
> Currently the producer uses the uncompressed message size to check against 
> {{max.request.size}} even if a {{compression.type}} is defined.  This can be 
> reproduced as follows:
> {code}
> # dd if=/dev/zero of=/tmp/outsmaller.dat bs=1024 count=1000
> # cat /tmp/out.dat | bin/kafka-console-producer --broker-list localhost:9092 
> --topic tester --producer-property compression.type=gzip
> {code}
> The above code creates a file that is the same size as the default for 
> {{max.request.size}} and the added overhead of the message pushes the 
> uncompressed size over the limit.  Compressing the message ahead of time 
> allows the message to go through.  When the message is blocked, the following 
> exception is produced:
> {code}
> [2016-09-14 08:56:19,558] ERROR Error when sending message to topic tester 
> with key: null, value: 1048576 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.RecordTooLargeException: The message is 
> 1048610 bytes when serialized which is larger than the maximum request size 
> you have configured with the max.request.size configuration.
> {code}
> For completeness, I have confirmed that the console producer is setting 
> {{compression.type}} properly by enabling DEBUG so this appears to be a 
> problem in the size estimate of the message itself.  I would suggest we 
> compress before we serialize instead of the other way around to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4169) Calculation of message size is too conservative for compressed messages

2016-09-14 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-4169:
--

 Summary: Calculation of message size is too conservative for 
compressed messages
 Key: KAFKA-4169
 URL: https://issues.apache.org/jira/browse/KAFKA-4169
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.10.0.1
Reporter: Dustin Cote


Currently the producer uses the uncompressed message size to check against 
{{max.request.size}} even if a {{compression.type}} is defined.  This can be 
reproduced as follows:

{code}
# dd if=/dev/zero of=/tmp/outsmaller.dat bs=1024 count=1000

# cat /tmp/out.dat | bin/kafka-console-producer --broker-list localhost:9092 
--topic tester --producer-property compression.type=gzip
{code}

The above code creates a file that is the same size as the default for 
{{max.request.size}} and the added overhead of the message pushes the 
uncompressed size over the limit.  Compressing the message ahead of time allows 
the message to go through.  When the message is blocked, the following 
exception is produced:
{code}
[2016-09-14 08:56:19,558] ERROR Error when sending message to topic tester with 
key: null, value: 1048576 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.RecordTooLargeException: The message is 1048610 
bytes when serialized which is larger than the maximum request size you have 
configured with the max.request.size configuration.
{code}

For completeness, I have confirmed that the console producer is setting 
{{compression.type}} properly by enabling DEBUG so this appears to be a problem 
in the size estimate of the message itself.  I would suggest we compress before 
we serialize instead of the other way around to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-09-08 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15473820#comment-15473820
 ] 

Dustin Cote commented on KAFKA-3590:


[~hachikuji] is going to take this one over for now.

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Jason Gustafson
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-09-08 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated KAFKA-3590:
---
Status: Open  (was: Patch Available)

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Dustin Cote
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-09-08 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated KAFKA-3590:
---
Assignee: Jason Gustafson  (was: Dustin Cote)

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Jason Gustafson
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-3129) Console producer issue when request-required-acks=0

2016-09-07 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote reopened KAFKA-3129:


Reopening per [~ijuma]'s recommendation as acks=0 in general still has a 
problem somewhere along the way that isn't fully understood.

> Console producer issue when request-required-acks=0
> ---
>
> Key: KAFKA-3129
> URL: https://issues.apache.org/jira/browse/KAFKA-3129
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.10.0.0
>Reporter: Vahid Hashemian
>Assignee: Dustin Cote
> Attachments: kafka-3129.mov, server.log.abnormal.txt, 
> server.log.normal.txt
>
>
> I have been running a simple test case in which I have a text file 
> {{messages.txt}} with 1,000,000 lines (lines contain numbers from 1 to 
> 1,000,000 in ascending order). I run the console consumer like this:
> {{$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test}}
> Topic {{test}} is on 1 partition with a replication factor of 1.
> Then I run the console producer like this:
> {{$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < 
> messages.txt}}
> Then the console starts receiving the messages. And about half the times it 
> goes all the way to 1,000,000. But, in other cases, it stops short, usually 
> at 999,735.
> I tried running another console consumer on another machine and both 
> consumers behave the same way. I can't see anything related to this in the 
> logs.
> I also ran the same experiment with a similar file of 10,000 lines, and am 
> getting a similar behavior. When the consumer does not receive all the 10,000 
> messages it usually stops at 9,864.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3129) Console producer issue when request-required-acks=0

2016-09-07 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote resolved KAFKA-3129.

Resolution: Fixed

> Console producer issue when request-required-acks=0
> ---
>
> Key: KAFKA-3129
> URL: https://issues.apache.org/jira/browse/KAFKA-3129
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.10.0.0
>Reporter: Vahid Hashemian
>Assignee: Dustin Cote
> Attachments: kafka-3129.mov, server.log.abnormal.txt, 
> server.log.normal.txt
>
>
> I have been running a simple test case in which I have a text file 
> {{messages.txt}} with 1,000,000 lines (lines contain numbers from 1 to 
> 1,000,000 in ascending order). I run the console consumer like this:
> {{$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test}}
> Topic {{test}} is on 1 partition with a replication factor of 1.
> Then I run the console producer like this:
> {{$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < 
> messages.txt}}
> Then the console starts receiving the messages. And about half the times it 
> goes all the way to 1,000,000. But, in other cases, it stops short, usually 
> at 999,735.
> I tried running another console consumer on another machine and both 
> consumers behave the same way. I can't see anything related to this in the 
> logs.
> I also ran the same experiment with a similar file of 10,000 lines, and am 
> getting a similar behavior. When the consumer does not receive all the 10,000 
> messages it usually stops at 9,864.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4134) Transparently notify users of "Connection Refused" for client to broker connections

2016-09-06 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-4134:
--

 Summary: Transparently notify users of "Connection Refused" for 
client to broker connections
 Key: KAFKA-4134
 URL: https://issues.apache.org/jira/browse/KAFKA-4134
 Project: Kafka
  Issue Type: Improvement
  Components: consumer, producer 
Affects Versions: 0.10.0.1
Reporter: Dustin Cote
Assignee: Dustin Cote
Priority: Minor


Currently, Producers and Consumers log at the WARN level if the bootstrap 
server disconnects and if there is an unexpected exception in the network 
Selector.  However, we log at DEBUG level if an IOException occurs in order to 
prevent spamming the user with every network hiccup.  This has the side effect 
of users making initial connections to brokers not getting any feedback if the 
bootstrap server list is invalid.  For example, if one starts the console 
producer or consumer up without any brokers running, nothing indicates messages 
are not being received until the socket timeout is hit.

I propose we be more granular and log the ConnectException to let the user know 
their broker(s) are not reachable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3129) Console producer issue when request-required-acks=0

2016-08-29 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote reassigned KAFKA-3129:
--

Assignee: Dustin Cote  (was: Neha Narkhede)

> Console producer issue when request-required-acks=0
> ---
>
> Key: KAFKA-3129
> URL: https://issues.apache.org/jira/browse/KAFKA-3129
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.10.0.0
>Reporter: Vahid Hashemian
>Assignee: Dustin Cote
> Attachments: kafka-3129.mov, server.log.abnormal.txt, 
> server.log.normal.txt
>
>
> I have been running a simple test case in which I have a text file 
> {{messages.txt}} with 1,000,000 lines (lines contain numbers from 1 to 
> 1,000,000 in ascending order). I run the console consumer like this:
> {{$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test}}
> Topic {{test}} is on 1 partition with a replication factor of 1.
> Then I run the console producer like this:
> {{$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < 
> messages.txt}}
> Then the console starts receiving the messages. And about half the times it 
> goes all the way to 1,000,000. But, in other cases, it stops short, usually 
> at 999,735.
> I tried running another console consumer on another machine and both 
> consumers behave the same way. I can't see anything related to this in the 
> logs.
> I also ran the same experiment with a similar file of 10,000 lines, and am 
> getting a similar behavior. When the consumer does not receive all the 10,000 
> messages it usually stops at 9,864.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4092) retention.bytes should not be allowed to be less than segment.bytes

2016-08-26 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-4092:
--

 Summary: retention.bytes should not be allowed to be less than 
segment.bytes
 Key: KAFKA-4092
 URL: https://issues.apache.org/jira/browse/KAFKA-4092
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Dustin Cote
Assignee: Dustin Cote
Priority: Minor


Right now retention.bytes can be as small as the user wants but it doesn't 
really get acted on for the active segment if retention.bytes is smaller than 
segment.bytes.  We shouldn't allow retention.bytes to be less than 
segment.bytes and validate that at startup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3129) Console producer issue when request-required-acks=0

2016-08-26 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439670#comment-15439670
 ] 

Dustin Cote commented on KAFKA-3129:


What I'm seeing is that we are faking the callback 
{code}org.apache.kafka.clients.producer.internals.Sender#handleProduceResponse{code}
 for the case where acks=0.  This is a problem because the callback gets 
generated when we do 
{code}org.apache.kafka.clients.producer.internals.Sender#createProduceRequests{code}
 in the run loop but the actual send happens a bit later.  When close() comes 
in that window between createProduceRequests and the send, you get messages 
that are lost.  Funny thing is that if you slow down call stack a bit by 
turning on something like strace, the issue goes away so it's hard to tell 
which layer exactly is buffering the requests.

So my question is, do we want to risk a small performance hit for all producers 
to be able to guarantee all messages with acks=0 actually make it out of the 
producer knowing full well that they aren't going to be verified to have made 
it to the broker?  I personally don't feel it's worth the extra locking 
complexity and could be documented known durability issue when you aren't using 
durability settings.  If we go that route, I feel like the console producer 
should have acks=1 by default.  That way, users who are getting started with 
the built-in tools have a cursory durability guarantee and can tune for 
performance instead.  What do you think [~ijuma] and [~vahid]?

> Console producer issue when request-required-acks=0
> ---
>
> Key: KAFKA-3129
> URL: https://issues.apache.org/jira/browse/KAFKA-3129
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.10.0.0
>Reporter: Vahid Hashemian
>Assignee: Neha Narkhede
> Attachments: kafka-3129.mov, server.log.abnormal.txt, 
> server.log.normal.txt
>
>
> I have been running a simple test case in which I have a text file 
> {{messages.txt}} with 1,000,000 lines (lines contain numbers from 1 to 
> 1,000,000 in ascending order). I run the console consumer like this:
> {{$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test}}
> Topic {{test}} is on 1 partition with a replication factor of 1.
> Then I run the console producer like this:
> {{$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < 
> messages.txt}}
> Then the console starts receiving the messages. And about half the times it 
> goes all the way to 1,000,000. But, in other cases, it stops short, usually 
> at 999,735.
> I tried running another console consumer on another machine and both 
> consumers behave the same way. I can't see anything related to this in the 
> logs.
> I also ran the same experiment with a similar file of 10,000 lines, and am 
> getting a similar behavior. When the consumer does not receive all the 10,000 
> messages it usually stops at 9,864.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3993) Console producer drops data

2016-08-26 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote resolved KAFKA-3993.

Resolution: Duplicate

Marking this a duplicate of KAFKA-3129.  The workaround here is to set acks=1 
on the console producer but acks=0 shouldn't mean that all requests don't get 
to make it out of the queue before close() finishes.  I'll follow up on 
KAFKA-3129 to keep everything in one place.

> Console producer drops data
> ---
>
> Key: KAFKA-3993
> URL: https://issues.apache.org/jira/browse/KAFKA-3993
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>
> The console producer drops data when if the process exits too quickly.  I 
> suspect that the shutdown hook does not call close() or something goes wrong 
> during that close().
> Here's a simple to illustrate the issue:
> {noformat}
> export BOOTSTRAP_SERVERS=localhost:9092
> export TOPIC=bar
> export MESSAGES=1
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
> --replication-factor 1 --topic "$TOPIC" \
> && echo "acks=all" > /tmp/producer.config \
> && echo "linger.ms=0" >> /tmp/producer.config \
> && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
> "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \
> && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
> --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4062) Require --print-data-log if --offsets-decoder is enabled for DumpLogOffsets

2016-08-18 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-4062:
--

 Summary: Require --print-data-log if --offsets-decoder is enabled 
for DumpLogOffsets
 Key: KAFKA-4062
 URL: https://issues.apache.org/jira/browse/KAFKA-4062
 Project: Kafka
  Issue Type: Improvement
  Components: admin
Reporter: Dustin Cote
Assignee: Dustin Cote
Priority: Minor


When using the DumpLogOffsets tool, if you want to print out contents of 
__consumer_offsets, you would typically use --offsets-decoder as an option.  
This option doesn't actually do much without --print-data-log enabled, so we 
should just require it to prevent user errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4017) Return more helpful responses when misconfigured connectors are submitted

2016-08-04 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407746#comment-15407746
 ] 

Dustin Cote commented on KAFKA-4017:


[~ewencp] yeah that's right.  Super easy to reproduce as well, just try to 
submit a connector with a bad config syntactically.  The actual curl I 
presented was:
{code}
curl -X POST -H "Content-Type: application/json" --data '{"name": 
"local-console-source", "config": 
{"connector.class":"org.apache.kafka.connect.file.FileStreamSourceConnector", 
"tasks.max":"1", "topic"="connect-test" }}' http://localhost:8083/connectors
{code}


> Return more helpful responses when misconfigured connectors are submitted
> -
>
> Key: KAFKA-4017
> URL: https://issues.apache.org/jira/browse/KAFKA-4017
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Dustin Cote
>Assignee: Ewen Cheslack-Postava
>
> Currently if a user submits a connector with a malformed configuration with 
> connect in distributed mode, the response is:
> {code}
> 
> 
> 
> Error 500 
> 
> 
> HTTP ERROR: 500
> Problem accessing /connectors. Reason:
> Request failed.
> Powered by Jetty://
> 
> 
> {code}
> If the user decides to then go look at the connect server side logging, they 
> can maybe parse the stack traces to find out what happened, but are at first 
> greeted by:
> {code}
> [2016-08-03 16:14:07,797] WARN /connectors 
> (org.eclipse.jetty.server.HttpChannel:384)
> java.lang.NoSuchMethodError: 
> javax.servlet.http.HttpServletRequest.isAsyncStarted()Z
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:684)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at 
> org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:159)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:499)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
> at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> It would be better if Connect can handle this scenario more gracefully and 
> make it more clear what the problem is even directly to the client.  In the 
> example above, you can eventually locate the problem in the server logs as:
> {code}
> [2016-08-03 16:14:07,795] WARN  (org.eclipse.jetty.servlet.ServletHandler:620)
> javax.servlet.ServletException: 
> org.glassfish.jersey.server.ContainerException: 
> com.fasterxml.jackson.databind.JsonMappingException: Unexpected character 
> ('=' (code 61)): was expecting a colon to separate field name and value
>  at [Source: 
> org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream@20fb9cff;
>  line: 1, column: 147] (through reference chain: 
> org.apache.kafka.connect.runtime.rest.entities.CreateConnectorRequest["config"])
> at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
> at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
>  

[jira] [Commented] (KAFKA-3766) Unhandled "not enough replicas" errors in SyncGroup

2016-07-29 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399520#comment-15399520
 ] 

Dustin Cote commented on KAFKA-3766:


Marking this as a dupe of KAFKA-3590.  I've tried to incorporate suggestion #2 
in my PR over there.

> Unhandled "not enough replicas" errors in SyncGroup
> ---
>
> Key: KAFKA-3766
> URL: https://issues.apache.org/jira/browse/KAFKA-3766
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> Caught by [~ijuma]. We seem to be missing at least a couple error codes when 
> handling the append log response when writing group metadata to the offsets 
> topic in the SyncGroup handler. In particular, we are missing checks for 
> NOT_ENOUGH_REPLICAS and NOT_ENOUGH_REPLICAS_AFTER_APPEND. Currently these 
> errors are returned directly in the SyncGroup response and cause an exception 
> to be raised to the user.
> There are two options to fix this problem:
> 1. We can continue to return these error codes in the sync group response and 
> add a handler on the client side to retry.
> 2. We can convert the errors on the server to something like 
> COORDINATOR_NOT_AVAILABLE, which will cause the client to retry with the 
> existing logic.
> The second option seems a little nicer to avoid exposing the internal 
> implementation of the SyncGroup request (i.e. that we write group metadata to 
> a partition). It also has the nice side effect of fixing old clients 
> automatically when the server is upgraded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-07-29 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399518#comment-15399518
 ] 

Dustin Cote commented on KAFKA-3590:


Thanks [~ijuma], I didn't see that one.  Updated the PR to move toward Jason's 
recommendation.

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Dustin Cote
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-07-29 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated KAFKA-3590:
---
Status: Patch Available  (was: Open)

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Dustin Cote
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-07-27 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote reassigned KAFKA-3590:
--

Assignee: Dustin Cote

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Dustin Cote
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-07-27 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated KAFKA-3590:
---
Component/s: (was: clients)
 consumer

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-07-27 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396213#comment-15396213
 ] 

Dustin Cote commented on KAFKA-3590:


[~salaev] it looks like this is occurring in a SyncGroup call so it's probably 
due to the fact that the new consumer needs to write to the __consumer_offsets 
topic and can't because __consumer_offsets isn't meeting the min ISR 
requirements for the cluster.  The error message isn't very clear, so maybe we 
can start by improving that.  I can take a shot at the pull request in the next 
couple of days.

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Dustin Cote
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2932) Adjust importance level of Kafka Connect configs

2016-07-22 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote reassigned KAFKA-2932:
--

Assignee: Dustin Cote  (was: Ewen Cheslack-Postava)

> Adjust importance level of Kafka Connect configs
> 
>
> Key: KAFKA-2932
> URL: https://issues.apache.org/jira/browse/KAFKA-2932
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Dustin Cote
>
> Some of the configuration importance levels are out of whack, probably due to 
> the way they evolved over time. For example, the internal converter settings 
> are currently marked with high importance, but they are really an internal 
> implementation detail that the user usually shouldn't need to worry about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2932) Adjust importance level of Kafka Connect configs

2016-07-22 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389722#comment-15389722
 ] 

Dustin Cote commented on KAFKA-2932:


[~ewencp] mind if I pick this one up?

> Adjust importance level of Kafka Connect configs
> 
>
> Key: KAFKA-2932
> URL: https://issues.apache.org/jira/browse/KAFKA-2932
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> Some of the configuration importance levels are out of whack, probably due to 
> the way they evolved over time. For example, the internal converter settings 
> are currently marked with high importance, but they are really an internal 
> implementation detail that the user usually shouldn't need to worry about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-07-22 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389618#comment-15389618
 ] 

Dustin Cote commented on KAFKA-2394:


had to change the PR to come from a different branch so I could have my trunk 
branch back :)

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Fix For: 0.11.0.0
>
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-06-13 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327390#comment-15327390
 ] 

Dustin Cote commented on KAFKA-2394:


[~ewencp] yeah I think that's for the best.  I wouldn't want to surprise anyone 
expecting the log naming scheme to stay the same with a change a minor version. 
 If you think that's too conservative of an outlook, I'll defer to your 
judgement :)

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Fix For: 0.11.0.0
>
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-06-10 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325356#comment-15325356
 ] 

Dustin Cote commented on KAFKA-2394:


[~ewencp] makes sense, I went ahead and updated the docs in the PR.  There 
wasn't a section on upgrading to 0.11 so I created one assuming that'll be the 
next non-fixpack to come out.  Since this is a potential incompatible change, 
I've set this to 0.11 for a target.

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Fix For: 0.11.0.0
>
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-06-10 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated KAFKA-2394:
---
Fix Version/s: 0.11.0.0

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Fix For: 0.11.0.0
>
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3802) log mtimes reset on broker restart

2016-06-07 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319233#comment-15319233
 ] 

Dustin Cote commented on KAFKA-3802:


I'm thinking that there's no partition reassignment happening here if it's a 
generic shutdown/startup.  Can you provide any environmental details 
[~ottomata]?  I think that would be helpful to pinpoint the problem.  So far, I 
haven't been able to reproduce this issue, seemingly because I am missing 
something in the environment that others have.

> log mtimes reset on broker restart
> --
>
> Key: KAFKA-3802
> URL: https://issues.apache.org/jira/browse/KAFKA-3802
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Otto
>
> Folks over in 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAO8=cz0ragjad1acx4geqcwj+rkd1gmdavkjwytwthkszfg...@mail.gmail.com%3E
>  are commenting about this issue.
> In 0.9, any data log file that was on
> disk before the broker has it's mtime modified to the time of the broker
> restart.
> This causes problems with log retention, as all the files then look like
> they contain recent data to kafka.  We use the default log retention of 7
> days, but if all the files are touched at the same time, this can cause us
> to retain up to 2 weeks of log data, which can fill up our disks.
> This happens *most* of the time, but seemingly not all.  We have seen broker 
> restarts where mtimes were not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-06-06 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316992#comment-15316992
 ] 

Dustin Cote edited comment on KAFKA-2394 at 6/6/16 6:57 PM:


[~ewencp] I only got positive feedback on the change from the mailing list, so 
sounds like this is a desirable change?  I think release notes for 
documentation would be sufficient based on that sentiment.


was (Author: cotedm):
[~ewencp] I only got positive feedback on the change from the mailing list, so 
sounds like this is a desirable change?

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-06-06 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316992#comment-15316992
 ] 

Dustin Cote commented on KAFKA-2394:


[~ewencp] I only got positive feedback on the change from the mailing list, so 
sounds like this is a desirable change?

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-06-02 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote reassigned KAFKA-2394:
--

Assignee: Dustin Cote  (was: jin xing)

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-06-02 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312409#comment-15312409
 ] 

Dustin Cote commented on KAFKA-2394:


Thanks [~ewencp].  I went ahead and sent out to users@ and dev@ and hopefully 
the impacted users will speak up :)

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: jin xing
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-05-26 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302150#comment-15302150
 ] 

Dustin Cote commented on KAFKA-2394:


[~hachikuji] the attached PR removes references to the DailyRollingFileAppender 
because it's not just at risk when you have spamming logging, but over the long 
term, the DailyRollingFileAppender has no cleanup policy that I know of by 
default.  It also has known issues with [losing log 
messages|https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/DailyRollingFileAppender.html].
  I think it's good to scrub DailyRollingFileAppender completely and notify of 
the format change for logging or add the log4j extras to the dependency list to 
match the formatting.  I prefer just changing the naming convention of the 
files but am certainly open to discussion. Thanks!

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: jin xing
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-05-26 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302098#comment-15302098
 ] 

Dustin Cote commented on KAFKA-2394:


[~jinxing6...@126.com] I'll go ahead and take this one to move it forward.  
Please let me know if you end up having time to work on it.  Thanks!

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: jin xing
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2971) KAFKA - Not obeying log4j settings, DailyRollingFileAppender not rolling files

2016-05-25 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote resolved KAFKA-2971.

Resolution: Resolved

Resolving as superceded by KAFKA-2394. Thanks [~gwenshap].

> KAFKA - Not obeying log4j settings, DailyRollingFileAppender not rolling files
> --
>
> Key: KAFKA-2971
> URL: https://issues.apache.org/jira/browse/KAFKA-2971
> Project: Kafka
>  Issue Type: Bug
>  Components: config, log
>Affects Versions: 0.8.2.2
> Environment: OS: Windows Server 2008 R2 Enterprise SP1
> log4j: 1.2.16
>Reporter: Damir Ban
>Assignee: Jay Kreps
> Fix For: 0.10.1.0
>
> Attachments: log4j.properties
>
>
> Per the settings in log4j it is expected that log files get rolled over 
> periodically, but they are just getting filled until restart of service when 
> they are overwritten.
> As we have intermittent fatal failures customer restarts the service and we 
> loose the information about the failure.
> We have tried different date paterns in the DailyRollingFileAppender, but no 
> change.
> Attaching the log4j.properties



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2971) KAFKA - Not obeying log4j settings, DailyRollingFileAppender not rolling files

2016-05-25 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300699#comment-15300699
 ] 

Dustin Cote commented on KAFKA-2971:


[~jkreps] and [~dban] I believe this issue is best addressed by moving to the 
much more stable RollingFileAppender as described in KAFKA-2394.  Any objection 
to using the approach there and closing this one out?

> KAFKA - Not obeying log4j settings, DailyRollingFileAppender not rolling files
> --
>
> Key: KAFKA-2971
> URL: https://issues.apache.org/jira/browse/KAFKA-2971
> Project: Kafka
>  Issue Type: Bug
>  Components: config, log
>Affects Versions: 0.8.2.2
> Environment: OS: Windows Server 2008 R2 Enterprise SP1
> log4j: 1.2.16
>Reporter: Damir Ban
>Assignee: Jay Kreps
> Fix For: 0.10.1.0
>
> Attachments: log4j.properties
>
>
> Per the settings in log4j it is expected that log files get rolled over 
> periodically, but they are just getting filled until restart of service when 
> they are overwritten.
> As we have intermittent fatal failures customer restarts the service and we 
> loose the information about the failure.
> We have tried different date paterns in the DailyRollingFileAppender, but no 
> change.
> Attaching the log4j.properties



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-05-24 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298296#comment-15298296
 ] 

Dustin Cote commented on KAFKA-2394:


[~jinxing6...@126.com] are you able to make a pull request for this issue 
instead of attaching the patch to the JIRA?  I think you've already been doing 
[the 
process|https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes#ContributingCodeChanges-PullRequest]
 on other JIRAs, and this one seems like an easy one to translate for you.

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: jin xing
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3683) Add file descriptor recommendation to ops guide

2016-05-09 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3683 started by Dustin Cote.
--
> Add file descriptor recommendation to ops guide
> ---
>
> Key: KAFKA-3683
> URL: https://issues.apache.org/jira/browse/KAFKA-3683
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Trivial
>
> The Ops section of the documentation says that the file descriptor limits are 
> an important OS configuration to pay attention to but offer no guidance on 
> how to configure them.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3666) Update new Consumer API links in the docs

2016-05-06 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274155#comment-15274155
 ] 

Dustin Cote commented on KAFKA-3666:


Leaving the github link here since the PR didn't get pushed for some reason 
https://github.com/apache/kafka/pull/1331

> Update new Consumer API links in the docs
> -
>
> Key: KAFKA-3666
> URL: https://issues.apache.org/jira/browse/KAFKA-3666
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Affects Versions: 0.9.0.0
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Trivial
>
> New Consumer API links in the docs point to the initial proposed design doc.  
> This should go to the real API docs now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3666) Update new Consumer API links in the docs

2016-05-06 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated KAFKA-3666:
---
Description: New Consumer API links in the docs point to the initial 
proposed design doc.  This should go to the real API docs now.  (was: The 
ConsumerOffsetChecker is deprecated in 0.9 but the docs still use it for the 
example of how to check offsets.  We should replace the example with the 
ConsumerGroupCommand starting in the 0.9 docs and going forward.  )

> Update new Consumer API links in the docs
> -
>
> Key: KAFKA-3666
> URL: https://issues.apache.org/jira/browse/KAFKA-3666
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Affects Versions: 0.9.0.0
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Trivial
>
> New Consumer API links in the docs point to the initial proposed design doc.  
> This should go to the real API docs now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3666) Update offset checking example in the docs

2016-05-06 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274090#comment-15274090
 ] 

Dustin Cote commented on KAFKA-3666:


After looking at the docs in trunk, it looks like the ConsumerGroupCommand is 
already documented by changes in KAFKA-1476.  Updating the description here to 
remove the links to the new consumer API and update them with the links that 
point to the javadoc instead of the (deprecated it appears) design doc.

> Update offset checking example in the docs
> --
>
> Key: KAFKA-3666
> URL: https://issues.apache.org/jira/browse/KAFKA-3666
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Affects Versions: 0.9.0.0
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Trivial
>
> The ConsumerOffsetChecker is deprecated in 0.9 but the docs still use it for 
> the example of how to check offsets.  We should replace the example with the 
> ConsumerGroupCommand starting in the 0.9 docs and going forward.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3666) Update new Consumer API links in the docs

2016-05-06 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated KAFKA-3666:
---
Summary: Update new Consumer API links in the docs  (was: Update offset 
checking example in the docs)

> Update new Consumer API links in the docs
> -
>
> Key: KAFKA-3666
> URL: https://issues.apache.org/jira/browse/KAFKA-3666
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Affects Versions: 0.9.0.0
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Trivial
>
> The ConsumerOffsetChecker is deprecated in 0.9 but the docs still use it for 
> the example of how to check offsets.  We should replace the example with the 
> ConsumerGroupCommand starting in the 0.9 docs and going forward.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3666) Update offset checking example in the docs

2016-05-06 Thread Dustin Cote (JIRA)
Dustin Cote created KAFKA-3666:
--

 Summary: Update offset checking example in the docs
 Key: KAFKA-3666
 URL: https://issues.apache.org/jira/browse/KAFKA-3666
 Project: Kafka
  Issue Type: Improvement
  Components: website
Affects Versions: 0.9.0.0
Reporter: Dustin Cote
Assignee: Dustin Cote
Priority: Trivial


The ConsumerOffsetChecker is deprecated in 0.9 but the docs still use it for 
the example of how to check offsets.  We should replace the example with the 
ConsumerGroupCommand starting in the 0.9 docs and going forward.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2670) add sampling rate to MirrorMaker

2016-03-21 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204620#comment-15204620
 ] 

Dustin Cote commented on KAFKA-2670:


I agree with Joel, it's probably better to keep the MirrorMaker simple.  
Especially because the only way I can actually see to do this is to add another 
branch statement into the hot code path for processing events, which marginally 
slows the MirrorMaker.  I'm going to checkout that interceptor API in some more 
detail and maybe it makes sense to add an example that would accomplish this 
same thing.  For example, intercept every X number of events and /dev/null them 
instead of letting them make it to the target cluster.  When I have a better 
handle on that I will make a pull request.

> add sampling rate to MirrorMaker
> 
>
> Key: KAFKA-2670
> URL: https://issues.apache.org/jira/browse/KAFKA-2670
> Project: Kafka
>  Issue Type: Wish
>  Components: tools
>Reporter: Christian Tramnitz
>Priority: Minor
>
> MirrorMaker could be used to copy data to different Kafka instances in 
> different environments (i.e. from production to development or test), but 
> often these are not at the same scale as production. A sampling rate could be 
> introduced to MirrorMaker to define a ratio of data to copied (per topic) to 
> downstream instances. Of course this should be 1:1 (or 100%) per default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)