[jira] [Updated] (KAFKA-6151) kafka broker not starting ; WARN kafka.log.Log: Found a corrupted index file

2017-10-30 Thread Prince Paulson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prince Paulson updated KAFKA-6151:
--
Description: 
{code}
r...@4233106d-cceb-0517-ba7b-4917466b3ede.spherestage-kc.net:/home/SPHERESTAGE/pp023549
 ( koopa_kafkabroker01 )
00:26:30 # service kafka status
Kafka is not running


r...@4233106d-cceb-0517-ba7b-4917466b3ede.spherestage-kc.net:/var/log/kafka ( 
koopa_kafkabroker01 )
00:44:25 # service kafka start
Kafka successfully started
r...@4233106d-cceb-0517-ba7b-4917466b3ede.spherestage-kc.net:/var/log/kafka ( 
koopa_kafkabroker01 )
00:44:32 # service kafka status
Kafka is not running


r...@4233106d-cceb-0517-ba7b-4917466b3ede.spherestage-kc.net:/var/log/kafka ( 
koopa_kafkabroker01 )
00:44:39 # tail -500f /var/log/kafka/server.log

2017-10-30 00:44:33,389 INFO kafka.server.KafkaServer: starting
2017-10-30 00:44:33,392 INFO kafka.server.KafkaServer: Connecting to zookeeper 
on 
42334c1d-3c61-3f57-4a28-09df4a706fa2.spherestage-kc.net:2181,4233668f-b29f-6beb-faab-8a0eda3067ac.spherestage-kc.net:2181,4233c278-fd8c-42b3-2402-ef8be142058b.spherestage-kc.net:2181/kafka/koopa
2017-10-30 00:44:33,497 INFO kafka.server.KafkaServer: Created zookeeper path 
/kafka/koopa
2017-10-30 00:44:34,150 INFO kafka.log.LogManager: Loading logs.
2017-10-30 00:44:34,199 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.deveng_v500.koopa-deveng_v500-aeon-clinicalevent-MillenniumSpout.DEFERRED-0/03650694.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,725 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.1201dep_refrecord_person_4_4-CDH5.1201dep_refrecord_person_4_4-CDH5_updated_recordIds.DEFERRED-0/.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,729 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.v500.koopa-v500-Personnel-Spout-0/.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,733 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.solgm_refrecord_person_4_9-CDH5.solgm_refrecord_person_4_9-CDH5_updated_recordIds.DEFERRED-0/.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,737 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/refrecord_logs-18/00405326.index, 
deleting and rebuilding index...
2017-10-30 00:44:34,841 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.deveng_v500.koopa-deveng_v500-PrivilegeDef-Spout-0/.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,844 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.1501stpdep_refrecord_norm_financial.refRecord-1501stpdep-koopa-1501stpdep_refrecord_norm_financial-CacheUpdaterSpout.DEFERRED-0/.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,847 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/refrecord_logs-1/00404938.index, 
deleting and rebuilding index...
2017-10-30 00:44:34,881 ERROR kafka.log.LogManager: There was an error in one 
of the threads during logs loading: java.lang.InternalError: a fault occurred 
in a recent unsafe memory access operation in compiled Java code
2017-10-30 00:44:34,881 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.1201dep_v500.koopa-1201dep_v500-aeon-foundations-MillenniumSpout.DEFERRED-0/00120966.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,882 FATAL kafka.server.KafkaServer: Fatal error during 
KafkaServer startup. Prepare to shutdown
java.lang.InternalError: a fault occurred in a recent unsafe memory access 
operation in compiled Java code
at 
org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:343)
at 
org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:167)
at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:150)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at 
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:72)
at 
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:64)
at 
kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at kafka.utils.IteratorTemplate.next(IteratorTemplate.scala:38)
at kafka.log.LogSegment.recover(LogSegment.scala:186)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
at 

[jira] [Created] (KAFKA-6151) kafka broker not starting ; WARN kafka.log.Log: Found a corrupted index file

2017-10-30 Thread Prince Paulson (JIRA)
Prince Paulson created KAFKA-6151:
-

 Summary: kafka broker not starting ; WARN kafka.log.Log: Found a 
corrupted index file
 Key: KAFKA-6151
 URL: https://issues.apache.org/jira/browse/KAFKA-6151
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.9.0.1
Reporter: Prince Paulson
Priority: Critical


r...@4233106d-cceb-0517-ba7b-4917466b3ede.spherestage-kc.net:/home/SPHERESTAGE/pp023549
 ( koopa_kafkabroker01 )
00:26:30 # service kafka status
Kafka is not running


r...@4233106d-cceb-0517-ba7b-4917466b3ede.spherestage-kc.net:/var/log/kafka ( 
koopa_kafkabroker01 )
00:44:25 # service kafka start
Kafka successfully started
r...@4233106d-cceb-0517-ba7b-4917466b3ede.spherestage-kc.net:/var/log/kafka ( 
koopa_kafkabroker01 )
00:44:32 # service kafka status
Kafka is not running


r...@4233106d-cceb-0517-ba7b-4917466b3ede.spherestage-kc.net:/var/log/kafka ( 
koopa_kafkabroker01 )
00:44:39 # tail -500f /var/log/kafka/server.log

2017-10-30 00:44:33,389 INFO kafka.server.KafkaServer: starting
2017-10-30 00:44:33,392 INFO kafka.server.KafkaServer: Connecting to zookeeper 
on 
42334c1d-3c61-3f57-4a28-09df4a706fa2.spherestage-kc.net:2181,4233668f-b29f-6beb-faab-8a0eda3067ac.spherestage-kc.net:2181,4233c278-fd8c-42b3-2402-ef8be142058b.spherestage-kc.net:2181/kafka/koopa
2017-10-30 00:44:33,497 INFO kafka.server.KafkaServer: Created zookeeper path 
/kafka/koopa
2017-10-30 00:44:34,150 INFO kafka.log.LogManager: Loading logs.
2017-10-30 00:44:34,199 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.deveng_v500.koopa-deveng_v500-aeon-clinicalevent-MillenniumSpout.DEFERRED-0/03650694.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,725 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.1201dep_refrecord_person_4_4-CDH5.1201dep_refrecord_person_4_4-CDH5_updated_recordIds.DEFERRED-0/.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,729 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.v500.koopa-v500-Personnel-Spout-0/.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,733 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.solgm_refrecord_person_4_9-CDH5.solgm_refrecord_person_4_9-CDH5_updated_recordIds.DEFERRED-0/.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,737 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/refrecord_logs-18/00405326.index, 
deleting and rebuilding index...
2017-10-30 00:44:34,841 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.deveng_v500.koopa-deveng_v500-PrivilegeDef-Spout-0/.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,844 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.1501stpdep_refrecord_norm_financial.refRecord-1501stpdep-koopa-1501stpdep_refrecord_norm_financial-CacheUpdaterSpout.DEFERRED-0/.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,847 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/refrecord_logs-1/00404938.index, 
deleting and rebuilding index...
2017-10-30 00:44:34,881 ERROR kafka.log.LogManager: There was an error in one 
of the threads during logs loading: java.lang.InternalError: a fault occurred 
in a recent unsafe memory access operation in compiled Java code
2017-10-30 00:44:34,881 WARN kafka.log.Log: Found a corrupted index file, 
/storage/kafka/00/kafka_data/KEPLER.1201dep_v500.koopa-1201dep_v500-aeon-foundations-MillenniumSpout.DEFERRED-0/00120966.index,
 deleting and rebuilding index...
2017-10-30 00:44:34,882 FATAL kafka.server.KafkaServer: Fatal error during 
KafkaServer startup. Prepare to shutdown
java.lang.InternalError: a fault occurred in a recent unsafe memory access 
operation in compiled Java code
at 
org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:343)
at 
org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:167)
at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:150)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at 
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:72)
at 
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:64)
at 
kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at kafka.utils.IteratorTemplate.next(IteratorTemplate.scala:38)
at 

[jira] [Commented] (KAFKA-4083) ReassignPartitionsCommand does not check replication factor

2017-10-30 Thread Simon Wan Wenli (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226284#comment-16226284
 ] 

Simon Wan Wenli commented on KAFKA-4083:


Is this patch still necessary? If so, I can resolve its conflicts.

> ReassignPartitionsCommand does not check replication factor
> ---
>
> Key: KAFKA-4083
> URL: https://issues.apache.org/jira/browse/KAFKA-4083
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, core
>Affects Versions: 0.8.2.0, 0.9.0.0, 0.10.0.0
> Environment: Linux 14.04 LTS, Scala 2.10
>Reporter: Simon Wan Wenli
>Assignee: Simon Wan Wenli
>Priority: Minor
>  Labels: newbie
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> ReassignPartitionsCommand does not check whether the replication factor of a 
> partition is the same as existing replication factor, which may result that 
> different partitions of a topic have different number of replicas. More 
> specifically, TopicCommand will throw exception which contains "All 
> partitions should have the same number of replicas." as the message when the 
> user tries to add partitions. I think this loophole also may leave Kafka in a 
> situation from which it is difficult to recover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4083) ReassignPartitionsCommand does not check replication factor

2017-10-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226194#comment-16226194
 ] 

Ewen Cheslack-Postava commented on KAFKA-4083:
--

Presumably it is https://github.com/apache/kafka/pull/1779

> ReassignPartitionsCommand does not check replication factor
> ---
>
> Key: KAFKA-4083
> URL: https://issues.apache.org/jira/browse/KAFKA-4083
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, core
>Affects Versions: 0.8.2.0, 0.9.0.0, 0.10.0.0
> Environment: Linux 14.04 LTS, Scala 2.10
>Reporter: Simon Wan Wenli
>Assignee: Simon Wan Wenli
>Priority: Minor
>  Labels: newbie
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> ReassignPartitionsCommand does not check whether the replication factor of a 
> partition is the same as existing replication factor, which may result that 
> different partitions of a topic have different number of replicas. More 
> specifically, TopicCommand will throw exception which contains "All 
> partitions should have the same number of replicas." as the message when the 
> user tries to add partitions. I think this loophole also may leave Kafka in a 
> situation from which it is difficult to recover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6147) Error reading field 'api_versions': Error reading field 'max_version': java.nio.BufferUnderflowException

2017-10-30 Thread Sandro Simas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226119#comment-16226119
 ] 

Sandro Simas commented on KAFKA-6147:
-

OK, I found the problem. My process was starting another producer due to 
logback-kafka-appender library. The library was starting a producer with 
bootstrap.servers pointing to localhost:9092, but this server does not exist. 
After editing started working without any errors. I think this error message is 
very confuse.

> Error reading field 'api_versions': Error reading field 'max_version': 
> java.nio.BufferUnderflowException
> 
>
> Key: KAFKA-6147
> URL: https://issues.apache.org/jira/browse/KAFKA-6147
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.11.0.0
>Reporter: Sandro Simas
>Priority: Minor
>
> I'm getting the following error on my kafka client 0.11.0.1:
> {code:java}
> [2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught 
> error in kafka producer I/O thread: 
> org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
> 'api_versions': Error reading field 'max_version': 
> java.nio.BufferUnderflowException
> at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
> at 
> org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
> at 
> org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
> at 
> org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
> at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Another similar error:
> {code:java}
> org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
> 'api_versions': Error reading array of size 65546, only 10 bytes available
> at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
> at 
> org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
> at 
> org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
> at 
> org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
> at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
> previous data. This errors appears suddenly even without producing messages.
> Although this error occurs, I can produce messages without any problems after 
> this error. This could be a network issue? I changed the servers version to 
> 0.10.2 and 0.10.1, but the problem persists.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6150) Make Repartition Topics Transient

2017-10-30 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226082#comment-16226082
 ] 

Guozhang Wang commented on KAFKA-6150:
--

A couple of issues to consider here: 

1. Restoration: repartition topics are still used as input topics for a task 
(i.e. a sub-topology) when initializing the task to restore state. In this 
scenario we should not restore from repartition topics anymore but from source 
topics / changelog topics only. This means we should not reuse the repartition 
topics as changelog topics.

2. Reset Tooling: today when we reset the application state, we will delete the 
intermediate topic. Purging data and the topic deletion should not be 
conflicting, but worth double checking on the admin client to make sure no 
async processing could result in race conditions.

3. Exactly-Once: when committing, we will write the offset as the currently 
consumed record's position + 1, indicating the next record to resume from. So 
deleting up to the currently consumer record's position should be fine.

> Make Repartition Topics Transient
> -
>
> Key: KAFKA-6150
> URL: https://issues.apache.org/jira/browse/KAFKA-6150
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>  Labels: operability
>
> Unlike changelog topics, the repartition topics could just be short-lived. 
> Today users have different ways to configure them with short retention such 
> as enforce a short retention period or use AppendTime for repartition topics. 
> All these would be cumbersome and Streams should just do this for the users.
> One way to do it is use the “purgeData” admin API (KIP-107) such that after 
> the offset of the input topics are committed, if the input topics are 
> actually repartition topics, we would purge the data immediately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6150) Make Repartition Topics Transient

2017-10-30 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-6150:


 Summary: Make Repartition Topics Transient
 Key: KAFKA-6150
 URL: https://issues.apache.org/jira/browse/KAFKA-6150
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Guozhang Wang
Assignee: Guozhang Wang


Unlike changelog topics, the repartition topics could just be short-lived. 
Today users have different ways to configure them with short retention such as 
enforce a short retention period or use AppendTime for repartition topics. All 
these would be cumbersome and Streams should just do this for the users.

One way to do it is use the “purgeData” admin API (KIP-107) such that after the 
offset of the input topics are committed, if the input topics are actually 
repartition topics, we would purge the data immediately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2017-10-30 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226013#comment-16226013
 ] 

Jun Rao commented on KAFKA-4084:


[~jeffwidman], the detailed perf results can be found in 
https://github.com/apache/kafka/pull/3427. One of the results is the following.

5 brokers, 25K topics, 1 partition, RF=2. 
Controlled shutdown time of a broker went down from 22 secs to 3 secs because 
of async ZK accesses and batching requests between the controller and the 
brokers.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6144) Allow state stores to serve stale reads during rebalance

2017-10-30 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226007#comment-16226007
 ] 

Matthias J. Sax commented on KAFKA-6144:


[~astubbs] Is this a duplicate of KAFKA-6031 ?

> Allow state stores to serve stale reads during rebalance
> 
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows.
> Relates to KAFKA-6145 - Warm up new KS instances before migrating tasks - 
> potentially a two phase rebalance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6147) Error reading field 'api_versions': Error reading field 'max_version': java.nio.BufferUnderflowException

2017-10-30 Thread Sandro Simas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Simas updated KAFKA-6147:

Description: 
I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error. This could be a network issue? I changed the servers version to 
0.10.2 and 0.10.1, but the problem persists.

  was:
I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error. This could be a network issue? I 

[jira] [Updated] (KAFKA-6147) Error reading field 'api_versions': Error reading field 'max_version': java.nio.BufferUnderflowException

2017-10-30 Thread Sandro Simas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Simas updated KAFKA-6147:

Description: 
I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error. This could be a network issue? I changed the servers version to 
0.10.2 and 0.10.1 but the problem persists.

  was:
I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error.
Downgrading the servers and clients 

[jira] [Created] (KAFKA-6149) LogCleanerManager should include topic partition name when warning of invalid cleaner offset

2017-10-30 Thread Ryan P (JIRA)
Ryan P created KAFKA-6149:
-

 Summary: LogCleanerManager should include topic partition name 
when warning of invalid cleaner offset 
 Key: KAFKA-6149
 URL: https://issues.apache.org/jira/browse/KAFKA-6149
 Project: Kafka
  Issue Type: Improvement
Reporter: Ryan P


The following message would be a lot more helpful if the topic partition name 
were included.


if (!isCompactAndDelete(log))
  warn(s"Resetting first dirty offset to log start offset 
$logStartOffset since the checkpointed offset $offset is invalid.")





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2017-10-30 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225870#comment-16225870
 ] 

Jeff Widman commented on KAFKA-4084:


[~junrao] any ballpark quantification to "much faster"? 

Are we talking 2x, 10x, or 100x faster?

When you say "batches the requests", I'm not sure what the batch size is... if 
it does all changes as a single batch or if there's multiple batches... so it's 
hard to guesstimate the expected performance impact.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4499) Add "getAllKeys" API for querying windowed KTable stores

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225833#comment-16225833
 ] 

ASF GitHub Bot commented on KAFKA-4499:
---

GitHub user ConcurrencyPractitioner opened a pull request:

https://github.com/apache/kafka/pull/4162

[KAFKA-4499]  Add "getAllKeys" for querying windowed KTable stores 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ConcurrencyPractitioner/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4162.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4162


commit 5635d189bb1f0a71f831a44626c206f2360c2f5e
Author: RichardYuSTUG 
Date:   2017-10-30T22:07:27Z

[KAFKA-4499] KafkaKAFKA-4499




> Add "getAllKeys" API for querying windowed KTable stores
> 
>
> Key: KAFKA-4499
> URL: https://issues.apache.org/jira/browse/KAFKA-4499
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>  Labels: needs-kip
> Attachments: 4499-All-test-v1.patch, 4499-CachingWindowStore-v1.patch
>
>
> Currently, both {{KTable}} and windowed-{{KTable}} stores can be queried via 
> IQ feature. While {{ReadOnlyKeyValueStore}} (for {{KTable}} stores) provide 
> method {{all()}} to scan the whole store (ie, returns an iterator over all 
> stored key-value pairs), there is no similar API for {{ReadOnlyWindowStore}} 
> (for windowed-{{KTable}} stores).
> This limits the usage of a windowed store, because the user needs to know 
> what keys are stored in order the query it. It would be useful to provide 
> possible APIs like this (only a rough sketch):
>  - {{keys()}} returns all keys available in the store (maybe together with 
> available time ranges)
>  - {{all(long timeFrom, long timeTo)}} that returns all window for a specific 
> time range
>  - {{allLatest()}} that returns the latest window for each key
> Because this feature would require to scan multiple segments (ie, RockDB 
> instances) it would be quite inefficient with current store design. Thus, 
> this feature also required to redesign the underlying window store itself.
> Because this is a major change, a KIP 
> (https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals)
>  is required. The KIP should cover the actual API design as well as the store 
> refactoring.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6148) ERROR Commit of WorkerSinkTask{id=bigquery-connector-log-0} offsets threw an unexpected exception: (org.apache.kafka.connect.runtime.WorkerSinkTask:205) java.lang.ClassC

2017-10-30 Thread Eugene Burd (JIRA)
Eugene Burd created KAFKA-6148:
--

 Summary: ERROR Commit of 
WorkerSinkTask{id=bigquery-connector-log-0} offsets threw an unexpected 
exception:  (org.apache.kafka.connect.runtime.WorkerSinkTask:205) 
java.lang.ClassCastException: 
org.apache.kafka.clients.consumer.OffsetAndMetadata cannot be cast
 Key: KAFKA-6148
 URL: https://issues.apache.org/jira/browse/KAFKA-6148
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Eugene Burd


I am trying to run a com.wepay.kafka.connect.bigquery.BigQuerySinkConnector 
connector, but getting the following exception.  

[2017-10-30 21:48:49,007] ERROR WorkerSinkTask{id=bigquery-connector-log-0} 
Offset commit failed, rewinding to last committed offsets 
(org.apache.kafka.connect.runtime.WorkerSinkTask:311)
java.lang.ClassCastException: 
org.apache.kafka.clients.consumer.OffsetAndMetadata cannot be cast to 
org.apache.kafka.clients.consumer.OffsetAndMetadata
at 
com.wepay.kafka.connect.bigquery.BigQuerySinkTask.updateOffsets(BigQuerySinkTask.java:107)
at 
com.wepay.kafka.connect.bigquery.BigQuerySinkTask.flush(BigQuerySinkTask.java:96)
at org.apache.kafka.connect.sink.SinkTask.preCommit(SinkTask.java:117)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:305)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:164)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:148)
at 
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
[2017-10-30 21:48:49,012] ERROR Commit of 
WorkerSinkTask{id=bigquery-connector-log-0} offsets threw an unexpected 
exception:  (org.apache.kafka.connect.runtime.WorkerSinkTask:205)
java.lang.ClassCastException: 
org.apache.kafka.clients.consumer.OffsetAndMetadata cannot be cast to 
org.apache.kafka.clients.consumer.OffsetAndMetadata
at 
com.wepay.kafka.connect.bigquery.BigQuerySinkTask.updateOffsets(BigQuerySinkTask.java:107)
at 
com.wepay.kafka.connect.bigquery.BigQuerySinkTask.flush(BigQuerySinkTask.java:96)
at org.apache.kafka.connect.sink.SinkTask.preCommit(SinkTask.java:117)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:305)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:164)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:148)
at 
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

I have checked the version number of kafka client in the plug in and kafka 
connect itself and they are the same.  
- kafka-clients-0.11.0.0.jar matches

I am still suspecting a type of versioning issue.  Do you have any advice? 

Thanks. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-6139) error when loading plugins

2017-10-30 Thread Eugene Burd (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Burd resolved KAFKA-6139.

Resolution: Information Provided

Error was due to having the plugin directory set a level too low (pointing 
inside the plugin). 

> error when loading plugins
> --
>
> Key: KAFKA-6139
> URL: https://issues.apache.org/jira/browse/KAFKA-6139
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Eugene Burd
>
> I am trying to run the Big Query connector for KafkaConnect 
> (https://github.com/wepay/kafka-connect-bigquery).  I have configured it 
> using a docker container, dropped the jar files for the connector in the 
> container and configured the plugin.path in the kafka connect config to point 
> to the directory with the jars.  
> Upon startup, Kafka is scanning the plugin folder and evaluating all the jar 
> files found.  It encounters the connector jar file, tests to see if it 
> extends a connector (in this case kcbq-connector-1.0.0-SNAPSHOT.jar), but 
> then it proceeds to create a new instance of the connector.  Code snippet 
> from DelegatingClassLoader.java below.  The problem is that the connector 
> class relies on other jar files that are in the plugin folder, but not in the 
> path.  This causes a class not found exception (below). 
> I suspect when the plugin connector is actually executed, it is done through 
> an isolated context and that context's classpath is set to the plugin folder, 
> so i think the connector would actually work.  But scanning for the connector 
> fails prior to ever getting there.
> I understand that you need a version of the connector and hence why this code 
> is running, but i think the new instance creation for version check needs to 
> be done in a sandbox to support the classpaths of the plugin connector. 
> [2017-10-28 06:04:08,961] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-bigquery/kcbq-connector-1.0.0-SNAPSHOT.jar 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:176)
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> com/google/cloud/bigquery/BigQuery
> at java.lang.Class.getDeclaredConstructors0(Native Method)
> at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
> at java.lang.Class.getConstructor0(Class.java:3075)
> at java.lang.Class.newInstance(Class.java:412)
> at 
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:242)
> at 
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:223)
> at 
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:198)
> at 
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:190)
> at 
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:150)
> at 
> org.apache.kafka.connect.runtime.isolation.Plugins.(Plugins.java:47)
> at 
> org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:63)
> Caused by: java.lang.ClassNotFoundException: 
> com.google.cloud.bigquery.BigQuery
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at 
> org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:62)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 11 more
> -
> private  Collection getPluginDesc( 
> Reflections reflections,
> Class klass,
> ClassLoader loader
> ) throws InstantiationException, IllegalAccessException {
> Set plugins = reflections.getSubTypesOf(klass);
> Collection result = new ArrayList<>();
> for (Class plugin : plugins) {
> if (PluginUtils.isConcrete(plugin)) {
> // Temporary workaround until all the plugins are versioned.
> if (Connector.class.isAssignableFrom(plugin)) {
> result.add(
> new PluginDesc<>(
> plugin,
> ((Connector) 
> plugin.newInstance()).version(),
> loader
> )
> );
> } else {
> result.add(new PluginDesc<>(plugin, "undefined", loader));
> }
> }
> }
> return result;
> }



--
This message was sent by 

[jira] [Updated] (KAFKA-6147) Error reading field 'api_versions': Error reading field 'max_version': java.nio.BufferUnderflowException

2017-10-30 Thread Sandro Simas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Simas updated KAFKA-6147:

Description: 
I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error.
Downgrading the servers and clients to version 0.10.1.1 works fine.

  was:
I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error.
Downgrading to 0.10 everythings works fine,


> Error reading field 

[jira] [Updated] (KAFKA-5647) Use async ZookeeperClient for Admin operations

2017-10-30 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-5647:
---
Description: Since we will be removing the ZK dependency in most of the 
admin clients, we only need to change the admin operations used on the server 
side. This includes converting AdminManager and the remaining usage of zkUtils 
in KafkaApis to use ZookeeperClient/KafkaZkClient.   (was: Since we will be 
removing the ZK dependency in most of the admin clients, we only need to change 
the admin operations used on the server side. This includes converting 
AdminManager to use Most of them will probably be covered in 
https://issues.apache.org/jira/browse/KAFKA-6073. )

> Use async ZookeeperClient for Admin operations
> --
>
> Key: KAFKA-5647
> URL: https://issues.apache.org/jira/browse/KAFKA-5647
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
> Fix For: 1.1.0
>
>
> Since we will be removing the ZK dependency in most of the admin clients, we 
> only need to change the admin operations used on the server side. This 
> includes converting AdminManager and the remaining usage of zkUtils in 
> KafkaApis to use ZookeeperClient/KafkaZkClient. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5647) Use async ZookeeperClient for Admin operations

2017-10-30 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-5647:
---
Description: Since we will be removing the ZK dependency in most of the 
admin clients, we only need to change the admin operations used on the server 
side. This includes converting AdminManager to use Most of them will probably 
be covered in https://issues.apache.org/jira/browse/KAFKA-6073.   (was: Since 
we will be removing the ZK dependency in most of the admin clients, we only 
need to change the admin operations used on the server side. Most of them will 
probably be covered in https://issues.apache.org/jira/browse/KAFKA-6073. )

> Use async ZookeeperClient for Admin operations
> --
>
> Key: KAFKA-5647
> URL: https://issues.apache.org/jira/browse/KAFKA-5647
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
> Fix For: 1.1.0
>
>
> Since we will be removing the ZK dependency in most of the admin clients, we 
> only need to change the admin operations used on the server side. This 
> includes converting AdminManager to use Most of them will probably be covered 
> in https://issues.apache.org/jira/browse/KAFKA-6073. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2017-10-30 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225767#comment-16225767
 ] 

Jun Rao edited comment on KAFKA-4084 at 10/30/17 9:30 PM:
--

The auto leader balancing logic now uses the async ZK api 
(https://issues.apache.org/jira/browse/KAFKA-5642) and batches the requests 
from the controller to the brokers. So, the process should be much faster with 
many partitions. Closing this for now.


was (Author: junrao):
The auto leader balancing logic now uses the async ZK api and batches the 
requests from the controller to the brokers. So, the process should be much 
faster with many partitions. Closing this for now.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2017-10-30 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-4084.

   Resolution: Fixed
Fix Version/s: 1.1.0

The auto leader balancing logic now uses the async ZK api and batches the 
requests from the controller to the brokers. So, the process should be much 
faster with many partitions. Closing this for now.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-3018) Kafka producer hangs on producer.close() call if the producer topic contains single quotes in the topic name

2017-10-30 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao reassigned KAFKA-3018:
--

Assignee: (was: Jun Rao)

> Kafka producer hangs on producer.close() call if the producer topic contains 
> single quotes in the topic name
> 
>
> Key: KAFKA-3018
> URL: https://issues.apache.org/jira/browse/KAFKA-3018
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: kanav anand
>
> While creating topics with quotes in the name throws a exception but if you 
> try to close a producer configured with a topic name with quotes the producer 
> hangs.
> It can be easily replicated and verified by setting topic.name for a producer 
> with a string containing single quotes in it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6147) Error reading field 'api_versions': Error reading field 'max_version': java.nio.BufferUnderflowException

2017-10-30 Thread Sandro Simas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Simas updated KAFKA-6147:

Description: 
I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error.
Downgrading to 0.10 everythings works fine,

  was:
I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error.
Downgrading to 0.10*** everythings works fine,


> Error reading field 'api_versions': Error reading 

[jira] [Updated] (KAFKA-6147) Error reading field 'api_versions': Error reading field 'max_version': java.nio.BufferUnderflowException

2017-10-30 Thread Sandro Simas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Simas updated KAFKA-6147:

Description: 
I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error.
Downgrading to 0.10*** everythings works fine,

  was:
I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error.



> Error reading field 'api_versions': Error reading field 'max_version': 
> 

[jira] [Created] (KAFKA-6147) Error reading field 'api_versions': Error reading field 'max_version': java.nio.BufferUnderflowException

2017-10-30 Thread Sandro Simas (JIRA)
Sandro Simas created KAFKA-6147:
---

 Summary: Error reading field 'api_versions': Error reading field 
'max_version': java.nio.BufferUnderflowException
 Key: KAFKA-6147
 URL: https://issues.apache.org/jira/browse/KAFKA-6147
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.11.0.0
Reporter: Sandro Simas


I'm getting the following error on my kafka client 0.11.0.1:
{code:java}
[2017-10-30 19:18:30] ERROR o.a.k.c.producer.internals.Sender - Uncaught error 
in kafka producer I/O thread: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading field 'max_version': 
java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

Another similar error:
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 65546, only 10 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:163)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:54)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:560)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{code}

The server is also 0.11.0.1 and I'm running Kafka and ZooKeeper without any 
previous data. This errors appears suddenly even without producing messages.
Although this error occurs, I can produce messages without any problems after 
this error.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5212) Consumer ListOffsets request can starve group heartbeats

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225514#comment-16225514
 ] 

ASF GitHub Bot commented on KAFKA-5212:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4110


> Consumer ListOffsets request can starve group heartbeats
> 
>
> Key: KAFKA-5212
> URL: https://issues.apache.org/jira/browse/KAFKA-5212
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Richard Yu
> Fix For: 1.1.0
>
> Attachments: 5212.patch
>
>
> The consumer is not able to send heartbeats while it is awaiting a 
> ListOffsets response. Typically this is not a problem because ListOffsets 
> requests are handled quickly, but in the worst case if the request takes 
> longer than the session timeout, the consumer will fall out of the group.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-5212) Consumer ListOffsets request can starve group heartbeats

2017-10-30 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-5212.

   Resolution: Fixed
Fix Version/s: (was: 1.0.1)

Issue resolved by pull request 4110
[https://github.com/apache/kafka/pull/4110]

> Consumer ListOffsets request can starve group heartbeats
> 
>
> Key: KAFKA-5212
> URL: https://issues.apache.org/jira/browse/KAFKA-5212
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Richard Yu
> Fix For: 1.1.0
>
> Attachments: 5212.patch
>
>
> The consumer is not able to send heartbeats while it is awaiting a 
> ListOffsets response. Typically this is not a problem because ListOffsets 
> requests are handled quickly, but in the worst case if the request takes 
> longer than the session timeout, the consumer will fall out of the group.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2903) FileRecords.read doesn't handle size > sizeInBytes when start is not zero

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225460#comment-16225460
 ] 

ASF GitHub Bot commented on KAFKA-2903:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4158


> FileRecords.read doesn't handle size > sizeInBytes when start is not zero
> -
>
> Key: KAFKA-2903
> URL: https://issues.apache.org/jira/browse/KAFKA-2903
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0, 0.11.0.0, 1.0.0
>Reporter: Pengwei
>Assignee: Jay Kreps
> Fix For: 1.1.0
>
>
> now the code is :
> def read(position: Int, size: Int): FileMessageSet = {
>. 
> new FileMessageSet(file,
>channel,
>start = this.start + position,
>end = math.min(this.start + position + size, 
> sizeInBytes()))
>   }
> if this.start is not 0, the end is only the FileMessageSet's size, not the 
> actually position of end position.
> the end parameter should be:
>  end = math.min(this.start + position + size, this.start+sizeInBytes())
> *Update* Problem still remains in FileRecords.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6146) re-register the exist watch on PreferredReplicaElectionZNode after the preferred leader election completes

2017-10-30 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225449#comment-16225449
 ] 

Jun Rao commented on KAFKA-6146:


We can fix this in a way similar to https://github.com/apache/kafka/pull/4143

> re-register the exist watch on PreferredReplicaElectionZNode after the 
> preferred leader election completes 
> ---
>
> Key: KAFKA-6146
> URL: https://issues.apache.org/jira/browse/KAFKA-6146
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 1.1.0
>Reporter: Jun Rao
> Fix For: 1.1.0
>
>
> Currently, after the PreferredReplicaElectionZNode is removed, we don't 
> register the exist watcher on the path again. This means that future 
> preferred replica election event will be missed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6145) Warm up new KS instances before migrating tasks - potentially a two phase rebalance

2017-10-30 Thread Antony Stubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antony Stubbs updated KAFKA-6145:
-
Description: 
Currently when expanding the KS cluster, the new node's partitions will be 
unavailable during the rebalance, which for large states can take a very long 
time, or for small state stores even more than a few ms can be a deal breaker 
for micro service use cases.
One workaround would be two execute the rebalance in two phases:
1) start running state store building on the new node
2) once the state store is fully populated on the new node, only then rebalance 
the tasks - there will still be a rebalance pause, but would be greatly reduced

Relates to: KAFKA-6144 - Allow state stores to serve stale reads during 
rebalance

  was:
Currently when expanding the KS cluster, the new node's partitions will be 
unavailable during the rebalance, which for large states can take a very long 
time, or for small state stores even more than a few ms can be a deal breaker 
for micro service use cases.
One workaround would be two execute the rebalance in two phases:
1) start running state store building on the new node
2) once the state store is fully populated on the new node, only then rebalance 
the tasks - there will still be a rebalance pause, but would be greatly reduced


> Warm up new KS instances before migrating tasks - potentially a two phase 
> rebalance
> ---
>
> Key: KAFKA-6145
> URL: https://issues.apache.org/jira/browse/KAFKA-6145
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround would be two execute the rebalance in two phases:
> 1) start running state store building on the new node
> 2) once the state store is fully populated on the new node, only then 
> rebalance the tasks - there will still be a rebalance pause, but would be 
> greatly reduced
> Relates to: KAFKA-6144 - Allow state stores to serve stale reads during 
> rebalance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6144) Allow state stores to serve stale reads during rebalance

2017-10-30 Thread Antony Stubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antony Stubbs updated KAFKA-6144:
-
Description: 
Currently when expanding the KS cluster, the new node's partitions will be 
unavailable during the rebalance, which for large states can take a very long 
time, or for small state stores even more than a few ms can be a deal breaker 
for micro service use cases.

One workaround is to allow stale data to be read from the state stores when use 
case allows.

Relates to KAFKA-6145 - Warm up new KS instances before migrating tasks - 
potentially a two phase rebalance

  was:
Currently when expanding the KS cluster, the new node's partitions will be 
unavailable during the rebalance, which for large states can take a very long 
time, or for small state stores even more than a few ms can be a deal breaker 
for micro service use cases.

One workaround is to allow stale data to be read from the state stores when use 
case allows.


> Allow state stores to serve stale reads during rebalance
> 
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows.
> Relates to KAFKA-6145 - Warm up new KS instances before migrating tasks - 
> potentially a two phase rebalance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6144) Allow state stores to serve stale reads during rebalance

2017-10-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225395#comment-16225395
 ] 

Ted Yu commented on KAFKA-6144:
---

Please prepare a KIP.

> Allow state stores to serve stale reads during rebalance
> 
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6145) Warm up new KS instances before migrating tasks - potentially a two phase rebalance

2017-10-30 Thread Antony Stubbs (JIRA)
Antony Stubbs created KAFKA-6145:


 Summary: Warm up new KS instances before migrating tasks - 
potentially a two phase rebalance
 Key: KAFKA-6145
 URL: https://issues.apache.org/jira/browse/KAFKA-6145
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Reporter: Antony Stubbs


Currently when expanding the KS cluster, the new node's partitions will be 
unavailable during the rebalance, which for large states can take a very long 
time, or for small state stores even more than a few ms can be a deal breaker 
for micro service use cases.
One workaround would be two execute the rebalance in two phases:
1) start running state store building on the new node
2) once the state store is fully populated on the new node, only then rebalance 
the tasks - there will still be a rebalance pause, but would be greatly reduced



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6144) Allow state stores to serve stale reads during rebalance

2017-10-30 Thread Antony Stubbs (JIRA)
Antony Stubbs created KAFKA-6144:


 Summary: Allow state stores to serve stale reads during rebalance
 Key: KAFKA-6144
 URL: https://issues.apache.org/jira/browse/KAFKA-6144
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Reporter: Antony Stubbs


Currently when expanding the KS cluster, the new node's partitions will be 
unavailable during the rebalance, which for large states can take a very long 
time, or for small state stores even more than a few ms can be a deal breaker 
for micro service use cases.

One workaround is to allow stale data to be read from the state stores when use 
case allows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-6130) VerifiableConsumer with --max-messages doesn't exit

2017-10-30 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-6130.

   Resolution: Fixed
Fix Version/s: 1.1.0

> VerifiableConsumer with --max-messages doesn't exit
> ---
>
> Key: KAFKA-6130
> URL: https://issues.apache.org/jira/browse/KAFKA-6130
> Project: Kafka
>  Issue Type: Bug
>Reporter: Tom Bentley
>Assignee: Tom Bentley
>Priority: Minor
> Fix For: 1.1.0
>
>
> If I run {{kafka-verifiable-consumer.sh --max-messages=N}} I expect the tool 
> to consume N messages and then exit. It will actually consume as many 
> messages as are in the topic and then block.
> The problem is that although  the max messages will cause the loop in 
> onRecordsReceived() to break, the loop in run() will just call 
> onRecordsReceived() again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6130) VerifiableConsumer with --max-messages doesn't exit

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225313#comment-16225313
 ] 

ASF GitHub Bot commented on KAFKA-6130:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4157


> VerifiableConsumer with --max-messages doesn't exit
> ---
>
> Key: KAFKA-6130
> URL: https://issues.apache.org/jira/browse/KAFKA-6130
> Project: Kafka
>  Issue Type: Bug
>Reporter: Tom Bentley
>Assignee: Tom Bentley
>Priority: Minor
>
> If I run {{kafka-verifiable-consumer.sh --max-messages=N}} I expect the tool 
> to consume N messages and then exit. It will actually consume as many 
> messages as are in the topic and then block.
> The problem is that although  the max messages will cause the loop in 
> onRecordsReceived() to break, the loop in run() will just call 
> onRecordsReceived() again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6073) Use ZookeeperClient in KafkaApis

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225292#comment-16225292
 ] 

ASF GitHub Bot commented on KAFKA-6073:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4111


> Use ZookeeperClient in KafkaApis
> 
>
> Key: KAFKA-6073
> URL: https://issues.apache.org/jira/browse/KAFKA-6073
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 1.1.0
>Reporter: Jun Rao
>Assignee: Mickael Maison
> Fix For: 1.1.0
>
>
> We want to replace the usage of ZkUtils with ZookeeperClient in KafkaApis.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6143) VerifiableProducer & VerifiableConsumer need tests

2017-10-30 Thread Tom Bentley (JIRA)
Tom Bentley created KAFKA-6143:
--

 Summary: VerifiableProducer & VerifiableConsumer need tests
 Key: KAFKA-6143
 URL: https://issues.apache.org/jira/browse/KAFKA-6143
 Project: Kafka
  Issue Type: Bug
Reporter: Tom Bentley
Priority: Minor


The {{VerifiableProducer}} and {{VerifiableConsumer}} used use for system 
tests, but don't have any tests themselves. They should have.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6142) Connect worker configurations and connector configurations should accept environment variables

2017-10-30 Thread Randall Hauch (JIRA)
Randall Hauch created KAFKA-6142:


 Summary: Connect worker configurations and connector 
configurations should accept environment variables
 Key: KAFKA-6142
 URL: https://issues.apache.org/jira/browse/KAFKA-6142
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 1.0.0
Reporter: Randall Hauch


Currently, when a worker or connector configuration is parsed, the values are 
used as-is without any kind of pre-processing before the value is used. It 
should be possible to define configuration properties such that string literal 
values or default values can use *_configuration variables_* that reference 
environment variables and/or system properties, and that these configuration 
variables are resolved/replaced before the configuration value is used.

I propose doing this enhancement in Kafka client's {{ConfigDef}} by adding a 
{{ConfigDef.Transformer}} interface:

{code:java}
/**
 * Transform the configuration value.
 */
public interface Transformer {
/**
 * Transform the configuration value.
 * @param name The name of the configuration
 * @param value The value of the configuration
 * @return the preprocessed value
 * @throws ConfigException if the value is invalid.
 */
Object apply(String name, Object value);
}
{code}

and then allowing {{Transformer}} implementations to be passed to 
{{ConfigDef.define(...)}} such all existing signatures are maintained for 
backward compatibility. By default, the definition would use an identity 
transform that simply returns the value. The transformers would be called in 
{{ConfigDef.parseValue(...)}} before the {{parseType(...)}} method is called, 
and would also be called on the default value if one is provided.

Then, a {{ConfigDef.ReplaceSystemVariables}} implementation would be provided 
to look in {{String}} values for zero or more variables defined with this EBNF 
grammar:

{noformat}
'$' '{' varName { ',' varName } [ ':' defaultValue] '}'
{noformat}

where:
* {{varName}} is the name of a Java system property or {{env.}} followed by the 
name of an environment variable, and 
* {{defaultValue}} specifies the replacement value used when no environment 
variable or system property is found, and defaults to an empty string. 

The value of the first system property or environment variable resolved is then 
used to replace the variable expression. This implementation would have trace 
or debug level logging to describe what it is doing.

For example, the variable {{$\{env.KAFKA_HOME\}}} would replace the variable 
expression with the value of the {{KAFKA_HOME}} environment variable or an 
blank string if that variable doesn't exist. 

Likewise, the variable {{$\{foo.prop1,foo.prop2,env.MY_ENV_VAR:value\}}} would 
be replaced with the value of the {{foo.prop1}} system property if it exists, 
or with the value of the {{foo.prop2}} system property if it exists, or with 
the value of the {{MY_ENV_VAR}} environment variable if it exists, or {{value}} 
if none of the system properties exist.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6042) Kafka Request Handler deadlocks and brings down the cluster.

2017-10-30 Thread Ben Corlett (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225159#comment-16225159
 ] 

Ben Corlett commented on KAFKA-6042:


The new build was installed 14:10ish on the 23rd. It was restarted mid-day on 
the 29th.

> Kafka Request Handler deadlocks and brings down the cluster.
> 
>
> Key: KAFKA-6042
> URL: https://issues.apache.org/jira/browse/KAFKA-6042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 1.0.0
> Environment: kafka version: 0.11.0.1
> client versions: 0.8.2.1-0.10.2.1
> platform: aws (eu-west-1a)
> nodes: 36 x r4.xlarge
> disk storage: 2.5 tb per node (~73% usage per node)
> topics: 250
> number of partitions: 48k (approx)
> os: ubuntu 14.04
> jvm: Java(TM) SE Runtime Environment (build 1.8.0_131-b11), Java HotSpot(TM) 
> 64-Bit Server VM (build 25.131-b11, mixed mode)
>Reporter: Ben Corlett
>Assignee: Rajini Sivaram
>Priority: Blocker
> Fix For: 1.0.0, 0.11.0.2
>
> Attachments: heapusage.png, thread_dump.txt.gz
>
>
> We have been experiencing a deadlock that happens on a consistent server 
> within our cluster. This happens multiple times a week currently. It first 
> started happening when we upgraded to 0.11.0.0. Sadly 0.11.0.1 failed to 
> resolve the issue.
> Sequence of events:
> At a seemingly random time broker 125 goes into a deadlock. As soon as it is 
> deadlocked it will remove all the ISR's for any partition is its the leader 
> for.
> [2017-10-10 00:06:10,061] INFO Partition [XX,24] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,073] INFO Partition [XX,974] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,079] INFO Partition [XX,64] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,081] INFO Partition [XX,21] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,084] INFO Partition [XX,12] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,085] INFO Partition [XX,61] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,086] INFO Partition [XX,53] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,088] INFO Partition [XX,27] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,090] INFO Partition [XX,182] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,091] INFO Partition [XX,16] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> 
> The other nodes fail to connect to the node 125 
> [2017-10-10 00:08:42,318] WARN [ReplicaFetcherThread-0-125]: Error in fetch 
> to broker 125, request (type=FetchRequest, replicaId=101, maxWait=500, 
> minBytes=1, maxBytes=10485760, fetchData={XX-94=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-22=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-58=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-11=(offset=78932482, 
> logStartOffset=50881481, maxBytes=1048576), XX-55=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-19=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-91=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-5=(offset=903857106, 
> logStartOffset=0, maxBytes=1048576), XX-80=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-88=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-34=(offset=308, 
> logStartOffset=308, maxBytes=1048576), XX-7=(offset=369990, 
> logStartOffset=369990, maxBytes=1048576), XX-0=(offset=57965795, 
> logStartOffset=0, maxBytes=1048576)}) (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 125 was disconnected before the response 
> was read
> at 
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:93)
> at 
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:93)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:207)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:151)
> at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
> As 

[jira] [Comment Edited] (KAFKA-6042) Kafka Request Handler deadlocks and brings down the cluster.

2017-10-30 Thread Ben Corlett (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225144#comment-16225144
 ] 

Ben Corlett edited comment on KAFKA-6042 at 10/30/17 3:33 PM:
--

It might be a memory leak. I'll keep an eye on it over the next few days.

!heapusage.png!


was (Author: corlettb):
It might be a memory leak. I'll keep an eye on it over the next few days.

!heapusage.png|thumbnail!

> Kafka Request Handler deadlocks and brings down the cluster.
> 
>
> Key: KAFKA-6042
> URL: https://issues.apache.org/jira/browse/KAFKA-6042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 1.0.0
> Environment: kafka version: 0.11.0.1
> client versions: 0.8.2.1-0.10.2.1
> platform: aws (eu-west-1a)
> nodes: 36 x r4.xlarge
> disk storage: 2.5 tb per node (~73% usage per node)
> topics: 250
> number of partitions: 48k (approx)
> os: ubuntu 14.04
> jvm: Java(TM) SE Runtime Environment (build 1.8.0_131-b11), Java HotSpot(TM) 
> 64-Bit Server VM (build 25.131-b11, mixed mode)
>Reporter: Ben Corlett
>Assignee: Rajini Sivaram
>Priority: Blocker
> Fix For: 1.0.0, 0.11.0.2
>
> Attachments: heapusage.png, thread_dump.txt.gz
>
>
> We have been experiencing a deadlock that happens on a consistent server 
> within our cluster. This happens multiple times a week currently. It first 
> started happening when we upgraded to 0.11.0.0. Sadly 0.11.0.1 failed to 
> resolve the issue.
> Sequence of events:
> At a seemingly random time broker 125 goes into a deadlock. As soon as it is 
> deadlocked it will remove all the ISR's for any partition is its the leader 
> for.
> [2017-10-10 00:06:10,061] INFO Partition [XX,24] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,073] INFO Partition [XX,974] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,079] INFO Partition [XX,64] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,081] INFO Partition [XX,21] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,084] INFO Partition [XX,12] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,085] INFO Partition [XX,61] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,086] INFO Partition [XX,53] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,088] INFO Partition [XX,27] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,090] INFO Partition [XX,182] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,091] INFO Partition [XX,16] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> 
> The other nodes fail to connect to the node 125 
> [2017-10-10 00:08:42,318] WARN [ReplicaFetcherThread-0-125]: Error in fetch 
> to broker 125, request (type=FetchRequest, replicaId=101, maxWait=500, 
> minBytes=1, maxBytes=10485760, fetchData={XX-94=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-22=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-58=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-11=(offset=78932482, 
> logStartOffset=50881481, maxBytes=1048576), XX-55=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-19=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-91=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-5=(offset=903857106, 
> logStartOffset=0, maxBytes=1048576), XX-80=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-88=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-34=(offset=308, 
> logStartOffset=308, maxBytes=1048576), XX-7=(offset=369990, 
> logStartOffset=369990, maxBytes=1048576), XX-0=(offset=57965795, 
> logStartOffset=0, maxBytes=1048576)}) (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 125 was disconnected before the response 
> was read
> at 
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:93)
> at 
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:93)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:207)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at 
> 

[jira] [Comment Edited] (KAFKA-6042) Kafka Request Handler deadlocks and brings down the cluster.

2017-10-30 Thread Ben Corlett (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225144#comment-16225144
 ] 

Ben Corlett edited comment on KAFKA-6042 at 10/30/17 3:33 PM:
--

It might be a memory leak. I'll keep an eye on it over the next few days.

!heapusage.jpg|thumbnail!


was (Author: corlettb):
heapusage

> Kafka Request Handler deadlocks and brings down the cluster.
> 
>
> Key: KAFKA-6042
> URL: https://issues.apache.org/jira/browse/KAFKA-6042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 1.0.0
> Environment: kafka version: 0.11.0.1
> client versions: 0.8.2.1-0.10.2.1
> platform: aws (eu-west-1a)
> nodes: 36 x r4.xlarge
> disk storage: 2.5 tb per node (~73% usage per node)
> topics: 250
> number of partitions: 48k (approx)
> os: ubuntu 14.04
> jvm: Java(TM) SE Runtime Environment (build 1.8.0_131-b11), Java HotSpot(TM) 
> 64-Bit Server VM (build 25.131-b11, mixed mode)
>Reporter: Ben Corlett
>Assignee: Rajini Sivaram
>Priority: Blocker
> Fix For: 1.0.0, 0.11.0.2
>
> Attachments: heapusage.png, thread_dump.txt.gz
>
>
> We have been experiencing a deadlock that happens on a consistent server 
> within our cluster. This happens multiple times a week currently. It first 
> started happening when we upgraded to 0.11.0.0. Sadly 0.11.0.1 failed to 
> resolve the issue.
> Sequence of events:
> At a seemingly random time broker 125 goes into a deadlock. As soon as it is 
> deadlocked it will remove all the ISR's for any partition is its the leader 
> for.
> [2017-10-10 00:06:10,061] INFO Partition [XX,24] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,073] INFO Partition [XX,974] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,079] INFO Partition [XX,64] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,081] INFO Partition [XX,21] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,084] INFO Partition [XX,12] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,085] INFO Partition [XX,61] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,086] INFO Partition [XX,53] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,088] INFO Partition [XX,27] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,090] INFO Partition [XX,182] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,091] INFO Partition [XX,16] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> 
> The other nodes fail to connect to the node 125 
> [2017-10-10 00:08:42,318] WARN [ReplicaFetcherThread-0-125]: Error in fetch 
> to broker 125, request (type=FetchRequest, replicaId=101, maxWait=500, 
> minBytes=1, maxBytes=10485760, fetchData={XX-94=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-22=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-58=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-11=(offset=78932482, 
> logStartOffset=50881481, maxBytes=1048576), XX-55=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-19=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-91=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-5=(offset=903857106, 
> logStartOffset=0, maxBytes=1048576), XX-80=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-88=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-34=(offset=308, 
> logStartOffset=308, maxBytes=1048576), XX-7=(offset=369990, 
> logStartOffset=369990, maxBytes=1048576), XX-0=(offset=57965795, 
> logStartOffset=0, maxBytes=1048576)}) (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 125 was disconnected before the response 
> was read
> at 
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:93)
> at 
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:93)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:207)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:151)
> at 
> 

[jira] [Commented] (KAFKA-6119) Silent Data Loss in Kafka011 Transactional Producer

2017-10-30 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225152#comment-16225152
 ] 

Guozhang Wang commented on KAFKA-6119:
--

It is merged into 0.11.0 branch, but like Apurva said we do not have a concrete 
date if ever we will have a 0.11.0.2 release.

> Silent Data Loss in Kafka011 Transactional Producer
> ---
>
> Key: KAFKA-6119
> URL: https://issues.apache.org/jira/browse/KAFKA-6119
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.11.0.0, 0.11.0.1
> Environment: openjdk version "1.8.0_144"
> OpenJDK Runtime Environment (Zulu 8.23.0.3-macosx) (build 1.8.0_144-b01)
> OpenJDK 64-Bit Server VM (Zulu 8.23.0.3-macosx) (build 25.144-b01, mixed mode)
>Reporter: Gary Y.
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: reliability
> Fix For: 1.0.0
>
>
> Kafka can lose data published by a transactional {{KafkaProducer}} under some 
> circumstances, i.e., data that should be committed atomically may not be 
> fully visible from a consumer with {{read_committed}} isolation level.
>  
> *Steps to reproduce:*
> # Set {{transaction.timeout.ms}} to a low value such as {{100}}
> # Publish two messages in one transaction to different partitions of a topic 
> with a sufficiently long time in-between the messages (e.g., 70 s).
> # Only the second message is visible with {{read_committed}} isolation level.
> See 
> https://github.com/GJL/kafka011-transactional-producer-bug-demo/blob/master/src/main/java/com/garyyao/App.java
>  for a full example. Detailed instructions can be found in the {{README.md}}: 
> https://github.com/GJL/kafka011-transactional-producer-bug-demo
> *Why is this possible?*
> Because the transaction timeout is set to a low value, the transaction will 
> be rolled back quickly after the first message is sent. Indeed, in the broker 
> the following logs could be found:
> {code}
> [2017-10-25 22:54:58,224] INFO [Transaction Coordinator 0]: Initialized 
> transactionalId test-producer-1508964897483 with producerId 5 and producer 
> epoch 0 on partition __transaction_state-10 
> (kafka.coordinator.transaction.TransactionCoordinator)
> [2017-10-25 22:55:24,260] INFO [Transaction Coordinator 0]: Completed 
> rollback ongoing transaction of transactionalId: test-producer-1508964897483 
> due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
> {code}
> After rollback, the second message is sent to a different partition than the 
> first message. 
> Upon, transaction commit, 
> {{org.apache.kafka.clients.producer.internals.TransactionManager}} may 
> enqueue the request {{addPartitionsToTransactionHandler}}:
> {code}
> private TransactionalRequestResult 
> beginCompletingTransaction(TransactionResult transactionResult) {
> if (!newPartitionsInTransaction.isEmpty())
> enqueueRequest(addPartitionsToTransactionHandler());
> EndTxnRequest.Builder builder = new 
> EndTxnRequest.Builder(transactionalId, producerIdAndEpoch.producerId,
> producerIdAndEpoch.epoch, transactionResult);
> EndTxnHandler handler = new EndTxnHandler(builder);
> enqueueRequest(handler);
> return handler.result;
> }
> {code}
> As can be seen, the condition is fulfilled if {{newPartitionsInTransaction}} 
> is non-empty. I suspect because the second message goes to a different 
> partition, this condition is satisfied.
> In {{KafkaApis.scala}}, I can see that {{handleAddPartitionToTxnRequest}} may 
> eventually call {{TransactionMetadata#prepareAddPartitions}}:
> {code}
>  def prepareAddPartitions(addedTopicPartitions: 
> immutable.Set[TopicPartition], updateTimestamp: Long): TxnTransitMetadata = {
> val newTxnStartTimestamp = state match {
>   case Empty | CompleteAbort | CompleteCommit => updateTimestamp
>   case _ => txnStartTimestamp
> }
> prepareTransitionTo(Ongoing, producerId, producerEpoch, txnTimeoutMs, 
> (topicPartitions ++ addedTopicPartitions).toSet,
>   newTxnStartTimestamp, updateTimestamp)
>   }
> {code}
> Note that the method's first argument {{newState}} of is always *Ongoing* 
> here. I suspect that this puts the transaction, which should be aborted, to 
> _Ongoing_ again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-6042) Kafka Request Handler deadlocks and brings down the cluster.

2017-10-30 Thread Ben Corlett (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225144#comment-16225144
 ] 

Ben Corlett edited comment on KAFKA-6042 at 10/30/17 3:33 PM:
--

It might be a memory leak. I'll keep an eye on it over the next few days.

!heapusage.png|thumbnail!


was (Author: corlettb):
It might be a memory leak. I'll keep an eye on it over the next few days.

!heapusage.jpg|thumbnail!

> Kafka Request Handler deadlocks and brings down the cluster.
> 
>
> Key: KAFKA-6042
> URL: https://issues.apache.org/jira/browse/KAFKA-6042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 1.0.0
> Environment: kafka version: 0.11.0.1
> client versions: 0.8.2.1-0.10.2.1
> platform: aws (eu-west-1a)
> nodes: 36 x r4.xlarge
> disk storage: 2.5 tb per node (~73% usage per node)
> topics: 250
> number of partitions: 48k (approx)
> os: ubuntu 14.04
> jvm: Java(TM) SE Runtime Environment (build 1.8.0_131-b11), Java HotSpot(TM) 
> 64-Bit Server VM (build 25.131-b11, mixed mode)
>Reporter: Ben Corlett
>Assignee: Rajini Sivaram
>Priority: Blocker
> Fix For: 1.0.0, 0.11.0.2
>
> Attachments: heapusage.png, thread_dump.txt.gz
>
>
> We have been experiencing a deadlock that happens on a consistent server 
> within our cluster. This happens multiple times a week currently. It first 
> started happening when we upgraded to 0.11.0.0. Sadly 0.11.0.1 failed to 
> resolve the issue.
> Sequence of events:
> At a seemingly random time broker 125 goes into a deadlock. As soon as it is 
> deadlocked it will remove all the ISR's for any partition is its the leader 
> for.
> [2017-10-10 00:06:10,061] INFO Partition [XX,24] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,073] INFO Partition [XX,974] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,079] INFO Partition [XX,64] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,081] INFO Partition [XX,21] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,084] INFO Partition [XX,12] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,085] INFO Partition [XX,61] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,086] INFO Partition [XX,53] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,088] INFO Partition [XX,27] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,090] INFO Partition [XX,182] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,091] INFO Partition [XX,16] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> 
> The other nodes fail to connect to the node 125 
> [2017-10-10 00:08:42,318] WARN [ReplicaFetcherThread-0-125]: Error in fetch 
> to broker 125, request (type=FetchRequest, replicaId=101, maxWait=500, 
> minBytes=1, maxBytes=10485760, fetchData={XX-94=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-22=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-58=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-11=(offset=78932482, 
> logStartOffset=50881481, maxBytes=1048576), XX-55=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-19=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-91=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-5=(offset=903857106, 
> logStartOffset=0, maxBytes=1048576), XX-80=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-88=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-34=(offset=308, 
> logStartOffset=308, maxBytes=1048576), XX-7=(offset=369990, 
> logStartOffset=369990, maxBytes=1048576), XX-0=(offset=57965795, 
> logStartOffset=0, maxBytes=1048576)}) (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 125 was disconnected before the response 
> was read
> at 
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:93)
> at 
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:93)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:207)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at 
> 

[jira] [Updated] (KAFKA-6042) Kafka Request Handler deadlocks and brings down the cluster.

2017-10-30 Thread Ben Corlett (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Corlett updated KAFKA-6042:
---
Attachment: heapusage.png

heapusage

> Kafka Request Handler deadlocks and brings down the cluster.
> 
>
> Key: KAFKA-6042
> URL: https://issues.apache.org/jira/browse/KAFKA-6042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 1.0.0
> Environment: kafka version: 0.11.0.1
> client versions: 0.8.2.1-0.10.2.1
> platform: aws (eu-west-1a)
> nodes: 36 x r4.xlarge
> disk storage: 2.5 tb per node (~73% usage per node)
> topics: 250
> number of partitions: 48k (approx)
> os: ubuntu 14.04
> jvm: Java(TM) SE Runtime Environment (build 1.8.0_131-b11), Java HotSpot(TM) 
> 64-Bit Server VM (build 25.131-b11, mixed mode)
>Reporter: Ben Corlett
>Assignee: Rajini Sivaram
>Priority: Blocker
> Fix For: 1.0.0, 0.11.0.2
>
> Attachments: heapusage.png, thread_dump.txt.gz
>
>
> We have been experiencing a deadlock that happens on a consistent server 
> within our cluster. This happens multiple times a week currently. It first 
> started happening when we upgraded to 0.11.0.0. Sadly 0.11.0.1 failed to 
> resolve the issue.
> Sequence of events:
> At a seemingly random time broker 125 goes into a deadlock. As soon as it is 
> deadlocked it will remove all the ISR's for any partition is its the leader 
> for.
> [2017-10-10 00:06:10,061] INFO Partition [XX,24] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,073] INFO Partition [XX,974] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,079] INFO Partition [XX,64] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,081] INFO Partition [XX,21] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,084] INFO Partition [XX,12] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,085] INFO Partition [XX,61] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,086] INFO Partition [XX,53] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,088] INFO Partition [XX,27] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,090] INFO Partition [XX,182] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,091] INFO Partition [XX,16] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> 
> The other nodes fail to connect to the node 125 
> [2017-10-10 00:08:42,318] WARN [ReplicaFetcherThread-0-125]: Error in fetch 
> to broker 125, request (type=FetchRequest, replicaId=101, maxWait=500, 
> minBytes=1, maxBytes=10485760, fetchData={XX-94=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-22=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-58=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-11=(offset=78932482, 
> logStartOffset=50881481, maxBytes=1048576), XX-55=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-19=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-91=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-5=(offset=903857106, 
> logStartOffset=0, maxBytes=1048576), XX-80=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-88=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-34=(offset=308, 
> logStartOffset=308, maxBytes=1048576), XX-7=(offset=369990, 
> logStartOffset=369990, maxBytes=1048576), XX-0=(offset=57965795, 
> logStartOffset=0, maxBytes=1048576)}) (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 125 was disconnected before the response 
> was read
> at 
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:93)
> at 
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:93)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:207)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:151)
> at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
> As node 125 removed all the ISRs as it was locking up, a failover for any 
> partition without 

[jira] [Updated] (KAFKA-4499) Add "getAllKeys" API for querying windowed KTable stores

2017-10-30 Thread Richard Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Yu updated KAFKA-4499:
--
Attachment: 4499-All-test-v1.patch

This is my current progress, although the tests for these methods are still in 
infancy.

> Add "getAllKeys" API for querying windowed KTable stores
> 
>
> Key: KAFKA-4499
> URL: https://issues.apache.org/jira/browse/KAFKA-4499
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>  Labels: needs-kip
> Attachments: 4499-All-test-v1.patch, 4499-CachingWindowStore-v1.patch
>
>
> Currently, both {{KTable}} and windowed-{{KTable}} stores can be queried via 
> IQ feature. While {{ReadOnlyKeyValueStore}} (for {{KTable}} stores) provide 
> method {{all()}} to scan the whole store (ie, returns an iterator over all 
> stored key-value pairs), there is no similar API for {{ReadOnlyWindowStore}} 
> (for windowed-{{KTable}} stores).
> This limits the usage of a windowed store, because the user needs to know 
> what keys are stored in order the query it. It would be useful to provide 
> possible APIs like this (only a rough sketch):
>  - {{keys()}} returns all keys available in the store (maybe together with 
> available time ranges)
>  - {{all(long timeFrom, long timeTo)}} that returns all window for a specific 
> time range
>  - {{allLatest()}} that returns the latest window for each key
> Because this feature would require to scan multiple segments (ie, RockDB 
> instances) it would be quite inefficient with current store design. Thus, 
> this feature also required to redesign the underlying window store itself.
> Because this is a major change, a KIP 
> (https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals)
>  is required. The KIP should cover the actual API design as well as the store 
> refactoring.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5520) Extend Consumer Group Reset Offset tool for Stream Applications

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224958#comment-16224958
 ] 

ASF GitHub Bot commented on KAFKA-5520:
---

Github user jeqo closed the pull request at:

https://github.com/apache/kafka/pull/3831


> Extend Consumer Group Reset Offset tool for Stream Applications
> ---
>
> Key: KAFKA-5520
> URL: https://issues.apache.org/jira/browse/KAFKA-5520
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, tools
>Reporter: Jorge Quilcate
>  Labels: kip
> Fix For: 1.1.0
>
>
> KIP documentation: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-171+-+Extend+Consumer+Group+Reset+Offset+for+Stream+Application



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5520) Extend Consumer Group Reset Offset tool for Stream Applications

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224916#comment-16224916
 ] 

ASF GitHub Bot commented on KAFKA-5520:
---

GitHub user jeqo opened a pull request:

https://github.com/apache/kafka/pull/4159

KAFKA-5520: KIP-171 - Extend Consumer Group Reset Offset for Stream 
Application - Merge with KIP-198

KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-171+-+Extend+Consumer+Group+Reset+Offset+for+Stream+Application

Merge changes from KIP-198

Ref: https://github.com/apache/kafka/pull/3831

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeqo/kafka feature/kip-171

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4159.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4159






> Extend Consumer Group Reset Offset tool for Stream Applications
> ---
>
> Key: KAFKA-5520
> URL: https://issues.apache.org/jira/browse/KAFKA-5520
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, tools
>Reporter: Jorge Quilcate
>  Labels: kip
> Fix For: 1.1.0
>
>
> KIP documentation: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-171+-+Extend+Consumer+Group+Reset+Offset+for+Stream+Application



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4499) Add "getAllKeys" API for querying windowed KTable stores

2017-10-30 Thread Damian Guy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224906#comment-16224906
 ] 

Damian Guy commented on KAFKA-4499:
---

[~Yohan123] you can submit a patch as a [WIP] PR, then we will be able to 
comment on it directly in github

> Add "getAllKeys" API for querying windowed KTable stores
> 
>
> Key: KAFKA-4499
> URL: https://issues.apache.org/jira/browse/KAFKA-4499
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>  Labels: needs-kip
> Attachments: 4499-CachingWindowStore-v1.patch
>
>
> Currently, both {{KTable}} and windowed-{{KTable}} stores can be queried via 
> IQ feature. While {{ReadOnlyKeyValueStore}} (for {{KTable}} stores) provide 
> method {{all()}} to scan the whole store (ie, returns an iterator over all 
> stored key-value pairs), there is no similar API for {{ReadOnlyWindowStore}} 
> (for windowed-{{KTable}} stores).
> This limits the usage of a windowed store, because the user needs to know 
> what keys are stored in order the query it. It would be useful to provide 
> possible APIs like this (only a rough sketch):
>  - {{keys()}} returns all keys available in the store (maybe together with 
> available time ranges)
>  - {{all(long timeFrom, long timeTo)}} that returns all window for a specific 
> time range
>  - {{allLatest()}} that returns the latest window for each key
> Because this feature would require to scan multiple segments (ie, RockDB 
> instances) it would be quite inefficient with current store design. Thus, 
> this feature also required to redesign the underlying window store itself.
> Because this is a major change, a KIP 
> (https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals)
>  is required. The KIP should cover the actual API design as well as the store 
> refactoring.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-3073) KafkaConnect should support regular expression for topics

2017-10-30 Thread Jeff Klukas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Klukas updated KAFKA-3073:
---
Fix Version/s: (was: 1.0.1)
   1.1.0

> KafkaConnect should support regular expression for topics
> -
>
> Key: KAFKA-3073
> URL: https://issues.apache.org/jira/browse/KAFKA-3073
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Jeff Klukas
> Fix For: 1.1.0
>
>
> KafkaConsumer supports both a list of topics or a pattern when subscribing. 
> KafkaConnect only supports a list of topics, which is not just more of a 
> hassle to configure - it also requires more maintenance.
> I suggest adding topics.pattern as a new configuration and letting users 
> choose between 'topics' or 'topics.pattern'.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-3073) KafkaConnect should support regular expression for topics

2017-10-30 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224901#comment-16224901
 ] 

Ismael Juma commented on KAFKA-3073:


The soonest this can go in is 1.1.0 since it's a new feature.

> KafkaConnect should support regular expression for topics
> -
>
> Key: KAFKA-3073
> URL: https://issues.apache.org/jira/browse/KAFKA-3073
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Jeff Klukas
> Fix For: 1.0.1
>
>
> KafkaConsumer supports both a list of topics or a pattern when subscribing. 
> KafkaConnect only supports a list of topics, which is not just more of a 
> hassle to configure - it also requires more maintenance.
> I suggest adding topics.pattern as a new configuration and letting users 
> choose between 'topics' or 'topics.pattern'.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-3073) KafkaConnect should support regular expression for topics

2017-10-30 Thread Jeff Klukas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Klukas updated KAFKA-3073:
---
Fix Version/s: 1.0.1

> KafkaConnect should support regular expression for topics
> -
>
> Key: KAFKA-3073
> URL: https://issues.apache.org/jira/browse/KAFKA-3073
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Jeff Klukas
> Fix For: 1.0.1
>
>
> KafkaConsumer supports both a list of topics or a pattern when subscribing. 
> KafkaConnect only supports a list of topics, which is not just more of a 
> hassle to configure - it also requires more maintenance.
> I suggest adding topics.pattern as a new configuration and letting users 
> choose between 'topics' or 'topics.pattern'.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-3073) KafkaConnect should support regular expression for topics

2017-10-30 Thread Jeff Klukas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Klukas updated KAFKA-3073:
---
Labels:   (was: needs-kip)

> KafkaConnect should support regular expression for topics
> -
>
> Key: KAFKA-3073
> URL: https://issues.apache.org/jira/browse/KAFKA-3073
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Jeff Klukas
>
> KafkaConsumer supports both a list of topics or a pattern when subscribing. 
> KafkaConnect only supports a list of topics, which is not just more of a 
> hassle to configure - it also requires more maintenance.
> I suggest adding topics.pattern as a new configuration and letting users 
> choose between 'topics' or 'topics.pattern'.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-2995) in 0.9.0.0 Old Consumer's commitOffsets with specify partition can submit not exists topic and partition to zk

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-2995.
--
Resolution: Auto Closed

Closing inactive issue. The old consumer is no longer supported, please upgrade 
to the Java consumer whenever possible.

> in 0.9.0.0 Old Consumer's commitOffsets with specify partition can submit not 
> exists topic and partition to zk
> --
>
> Key: KAFKA-2995
> URL: https://issues.apache.org/jira/browse/KAFKA-2995
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Pengwei
>Assignee: Neha Narkhede
>
> in 0.9.0.0 Version, the Old Consumer's commit interface is below:
> def commitOffsets(offsetsToCommit: immutable.Map[TopicAndPartition, 
> OffsetAndMetadata], isAutoCommit: Boolean) {
> trace("OffsetMap: %s".format(offsetsToCommit))
> var retriesRemaining = 1 + (if (isAutoCommit) 0 else 
> config.offsetsCommitMaxRetries) // no retries for commits from auto-commit
> var done = false
> while (!done) {
>   val committed = offsetsChannelLock synchronized {
> // committed when we receive either no error codes or only 
> MetadataTooLarge errors
> if (offsetsToCommit.size > 0) {
>   if (config.offsetsStorage == "zookeeper") {
> offsetsToCommit.foreach { case (topicAndPartition, 
> offsetAndMetadata) =>
>   commitOffsetToZooKeeper(topicAndPartition, 
> offsetAndMetadata.offset)
> }
>   
> this interface does not check the parameter offsetsToCommit, if 
> offsetsToCommit has some topic or partition which is not exist in the kafka. 
> Then will create an entry in the  /consumers/[group]/offsets/[Not exists 
> topic]   directory.
> We should check the offsetsToCommit's topic and partition is exists or just 
> check it is contain in the topicRegistry or checkpointedZkOffsets ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-2062) Sync Producer, Variable Message Length, Multiple Threads = Direct memory overuse

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-2062.
--
Resolution: Auto Closed

Closing inactive issue. The old producer is no longer supported, please upgrade 
to the Java producer whenever possible.

> Sync Producer, Variable Message Length, Multiple Threads = Direct memory 
> overuse
> 
>
> Key: KAFKA-2062
> URL: https://issues.apache.org/jira/browse/KAFKA-2062
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.1.1
>Reporter: Michael Braun
>Assignee: Jun Rao
>
> Using a synchronous producer with multiple threads each calling .send on the 
> single producer object, each thread ends up maintaining a threadlocal direct 
> memory buffer. In a case of messages where the sizing varies(for instance, 
> 99% of messages are 1MB and 1% are 100MB), eventually the buffers seem to 
> expand to this level for all the threads which can cause an out of memory - 
> direct buffer memory error:
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_67]
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.7.0_67]
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_67]
>   at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174) ~[na:1.7.0_67]
>   at sun.nio.ch.IOUtil.write(IOUtil.java:130) ~[na:1.7.0_67]
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannel.java:493) ~[na:1.7.0_67]
>   at java.nio.channels.SocketChannel.write(SocketChannel.java:493) 
> ~[na:1.7.0_67]
>   at 
> kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) 
> ~[kafka_2.10-0.8.1.1.jar:na]
>   at kafka.network.Send$class.writeCompletely(Transmission.scala:75) 
> ~[kafka_2.10-0.8.1.1.jar:na]
>   at 
> kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
>  ~[kafka_2.10-0.8.1.1.jar:na]
>   at kafka.network.BlockingChannel.send(BlockingChannel.scala:92) 
> ~[kafka_2.10-0.8.1.1.jar:na]
>   at kafka.producer.SyncProducer.liftedTree$1(SyncProducer.scala:72) 
> ~[kafka_2.10-0.8.1.1.jar:na]
>   at 
> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)
>  ~[kafka_2.10-0.8.1.1.jar:na]
>   at 
> kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:102)
>  ~[kafka_2.10-0.8.1.1.jar:na]
>   at 
> kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)
>  ~[kafka_2.10-0.8.1.1.jar:na]
>   at 
> kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)
>  ~[kafka_2.10-0.8.1.1.jar:na]
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) 
> ~[kafka_2.10-0.8.1.1.jar:na]
>   at kafka.producer.SyncProducer.send(SyncProducer.scala:100) 
> ~[kafka_2.10-0.8.1.1.jar:na]
>   at 
> kafka.producer.async.DefaultEventHandler.kafka$producer$async$DEfaultEventHandler$$send(DefaultEventHandler.scala:255)
>  [kafka_2.10-0.8.1.1.jar:na]
>   at 
> kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$2.apply(DefaultEventHandler.scala:106)
>  [kafka_2.10-0.8.1.1.jar:na]
>   at 
> kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$2.apply(DefaultEventHandler.scala:100)
>  [kafka_2.10-0.8.1.1.jar:na]
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>  [scala-library-2.10.1.jar:na]
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) 
> [scala-library-2.10.1.jar:na]
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) 
> [scala-library-2.10.1.jar:na]
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTasble.scala:226) 
> [scala-library-2.10.1.jar:na]
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) 
> [scala-library-2.10.1.jar:na]
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) 
> [scala-library-2.10.1.jar:na]
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>  [scala-library-2.10.1.jar:na]
>   at 
> kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DEfaultEventHandler.scala:100)
>  [kafka_2.10-0.8.1.1.jar:na]
>   at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:72) 
> [kafka_2.10-0.8.1.1.jar:na]
>   at kafka.producer.Producer.send(Producer.scala:76) 
> [kafka_2.10-0.8.1.1.jar:na]
>   at kafka.javaapi.producer.Producer.send(Producer.scala:33) 
> [kafka_2.10-0.8.1.1.jar:na]
> 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-1958) ZookeeperConsumerConnector doesn't remove consumer node on shutdown.

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-1958.
--
Resolution: Auto Closed

Closing inactive issue. The old consumer is no longer supported, please upgrade 
to the Java consumer whenever possible.

> ZookeeperConsumerConnector doesn't remove consumer node on shutdown.
> 
>
> Key: KAFKA-1958
> URL: https://issues.apache.org/jira/browse/KAFKA-1958
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.1.1
>Reporter: Beletsky Andrey
>Assignee: Neha Narkhede
>  Labels: consumer, shutdown, zookeeper
>
> We use kafka with ZooKeeper via high level consumer.
> There is a scheduled job that creates a consumer with specific group, does 
> necessary logic and shuts down this consumer.
> +An issue:+
> Nobody deletes */consumers/myGroup/ids/myGroup__*. And after 
> several job runs there are a lot of dead consumer IDs under myGroup. I've got 
> [an 
> issue|https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whysomeoftheconsumersinaconsumergroupneverreceiveanymessage?]
>  that new consumer doesn't see a partition.
> We start to implement an approach to remove a consumer nodes from Zookeeper 
> manually after consumer is shutted down.
> I think better way to remove this node during 
> *ZookeeperConsumerConnector.shutdown()*.
> *P.S.:*
> If I missed something in your sources please let me know.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-1415) Async producer.send can block forever if async.ProducerSendThread dies

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-1415.
--
Resolution: Auto Closed

Closing inactive issue. The old producer is no longer supported, please upgrade 
to the Java producer whenever possible.

> Async producer.send can block forever if async.ProducerSendThread dies
> --
>
> Key: KAFKA-1415
> URL: https://issues.apache.org/jira/browse/KAFKA-1415
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, producer 
>Affects Versions: 0.8.0
> Environment: kafka_2.9.2-0.8.0.jar
>Reporter: James Blackburn
>Assignee: Jun Rao
>
> We noticed that if something goes fundamentally wrong (in this case the jars 
> were replaced under a running Producer's feet) then async calls to: 
> {{producer.send}} can lockup forever.
> I saw in the log file the following exception logged:
> {code}
> [2014-04-17 16:45:36,484] INFO Disconnecting from cn2:9092 
> (kafka.producer.SyncProducer)
> Exception in thread "ProducerSendThread-" java.lang.NoClassDefFoundError: 
> kafka/producer/async/ProducerSendThread$$anonfun$run$1
> at 
> kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:46)
> Caused by: java.lang.ClassNotFoundException: 
> kafka.producer.async.ProducerSendThread$$anonfun$run$1
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 1 more
> {code}
> However my application continued running. Jstack showed that the 
> producer.send calls had all locked up:
> {code}
> "SubscriberEventQueue0Executor-1" prio=10 tid=0x2aaab0a88000 nid=0x44f5 
> waiting on condition [0x44ac4000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x000790c47918> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:349)
>   at kafka.producer.Producer$$anonfun$asyncSend$1.apply(Producer.scala:98)
>   at kafka.producer.Producer$$anonfun$asyncSend$1.apply(Producer.scala:90)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:33)
>   at kafka.producer.Producer.asyncSend(Producer.scala:90)
>   at kafka.producer.Producer.send(Producer.scala:77)
>   - locked <0x000791768ee8> (a java.lang.Object)
>   at kafka.javaapi.producer.Producer.send(Producer.scala:33)
>   at com.mi.ahl.kafka.rmds2kafka.Bridge$1.onMarketData(Bridge.java:165)
>Locked ownable synchronizers:
>   - <0x000792205cd0> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "SubscriberEventQueue1Executor-2" prio=10 tid=0x2aaab0aa nid=0x4511 
> waiting for monitor entry [0x44dc7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at kafka.producer.Producer.send(Producer.scala:71)
>   - waiting to lock <0x000791768ee8> (a java.lang.Object)
>   at kafka.javaapi.producer.Producer.send(Producer.scala:33)
>   at com.mi.ahl.kafka.rmds2kafka.Bridge$1.onMarketData(Bridge.java:165)
> "SubscriberEventQueue2Executor-3" prio=10 tid=0x2aaab0ab6800 nid=0x4512 
> waiting for monitor entry [0x44ec8000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at kafka.producer.Producer.send(Producer.scala:71)
>   - waiting to lock <0x000791768ee8> (a java.lang.Object)
>   at kafka.javaapi.producer.Producer.send(Producer.scala:33)
>   at com.mi.ahl.kafka.rmds2kafka.Bridge$1.onMarketData(Bridge.java:165)
> "SubscriberEventQueue3Executor-4" prio=10 tid=0x2aaab0ab8800 nid=0x4513 
> waiting for monitor entry [0x44fc9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at kafka.producer.Producer.send(Producer.scala:71)
>   - waiting to lock <0x000791768ee8> (a java.lang.Object)
>   at kafka.javaapi.producer.Producer.send(Producer.scala:33)
>   at com.mi.ahl.kafka.rmds2kafka.Bridge$1.onMarketData(Bridge.java:165)
> {code}
> *Expectation:*
> {{producer.send}} 

[jira] [Resolved] (KAFKA-1066) Reduce logging on producer connection failures

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-1066.
--
Resolution: Auto Closed

Closing inactive issue. The old producer is no longer supported

> Reduce logging on producer connection failures
> --
>
> Key: KAFKA-1066
> URL: https://issues.apache.org/jira/browse/KAFKA-1066
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.0
>Reporter: Jason Rosenberg
>Assignee: Jun Rao
>
> Below is a stack trace from a unit test, where a producer tries to send a 
> message, but no server is available.
> The exception/message logging seems to be inordinately verbose.  
> I'm thinking a simple change could be to not log full stack traces for simple 
> things like "Connection refused", etc.  Seems it would be fine to just log 
> the exception message in such cases.
> Also, the log levels could be tuned, such that things logged as ERROR 
> indicate that all possible retries have been attempted, rather than having it 
> be an ERROR for each step of the retry/failover process.  Thus, for a 
> redundant, clustered service, it should be considered normal that single 
> nodes will be unavailable (such as when we're doing a rolling restart of the 
> cluster, etc.).  It should only be an ERROR if all brokers/all replicas are 
> unavailable, etc.  This way, we can selectively set our log level to ERROR, 
> and have it be useful.
> This is from one of my unit tests.  I am using the default retry count (3) 
> here, but even if I reduced that, it seems this is a crazy amount of logging. 
>  (I've edited this to remove from each stack trace the portion of 
> testing/calling code, into the Producer.send() call).
> Here's the code snippet that produced the logging below (and note, the server 
> was not available on the requested port).
>   KeyedMessage msg = new KeyedMessage(topic, 
> message);
>   producer.send(msg);
> Jason
> 599 [main] ERROR kafka.producer.SyncProducer  - Producer connection to 
> localhost:1025 unsuccessful
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.Net.connect0(Native Method)
>   at sun.nio.ch.Net.connect(Net.java:465)
>   at sun.nio.ch.Net.connect(Net.java:457)
>   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:639)
>   at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
>   at kafka.producer.SyncProducer.connect(SyncProducer.scala:146)
>   at 
> kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:161)
>   at 
> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68)
>   at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
>   at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
>   at 
> kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
>   at 
> kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:69)
>   at kafka.utils.Utils$.swallow(Utils.scala:186)
>   at kafka.utils.Logging$class.swallowError(Logging.scala:105)
>   at kafka.utils.Utils$.swallowError(Utils.scala:45)
>   at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:69)
>   at kafka.producer.Producer.send(Producer.scala:74)
>   at kafka.javaapi.producer.Producer.send(Producer.scala:32)
>   ...
> 615 [main] WARN kafka.client.ClientUtils$  - Fetching topic metadata with 
> correlation id 0 for topics [Set(test-topic)] from broker 
> [id:0,host:localhost,port:1025] failed
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.Net.connect0(Native Method)
>   at sun.nio.ch.Net.connect(Net.java:465)
>   at sun.nio.ch.Net.connect(Net.java:457)
>   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:639)
>   at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
>   at kafka.producer.SyncProducer.connect(SyncProducer.scala:146)
>   at 
> kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:161)
>   at 
> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68)
>   at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
>   at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
>   at 
> kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
>   at 
> kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:69)
>   at kafka.utils.Utils$.swallow(Utils.scala:186)
>   at kafka.utils.Logging$class.swallowError(Logging.scala:105)
>   at kafka.utils.Utils$.swallowError(Utils.scala:45)
>   at 
> 

[jira] [Resolved] (KAFKA-1006) Consumer loses messages of a new topic with auto.offset.reset = largest

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-1006.
--
Resolution: Auto Closed

Closing inactive issue. The old consumer is no longer supported, please upgrade 
to the Java consumer whenever possible.

> Consumer loses messages of a new topic with auto.offset.reset = largest
> ---
>
> Key: KAFKA-1006
> URL: https://issues.apache.org/jira/browse/KAFKA-1006
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Swapnil Ghike
>Assignee: Guozhang Wang
>  Labels: usability
>
> Consumer currently uses auto.offset.reset = largest by default. If a new 
> topic is created, consumer's topic watcher is fired. The consumer will first 
> finish partition reassignment as part of rebalance and then start consuming 
> from the tail of each partition. Until the partition reassignment is over, 
> the server may have appended new messages to the new topic, consumer won't 
> consume these messages. Thus, multiple batches of messages may be lost when a 
> topic is newly created. 
> The fix is to start consuming from the earliest offset for newly created 
> topics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-610) connect.timeout.ms seems to do the wrong thing in the producer

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-610.
-
Resolution: Auto Closed

Closing  inactive issue. The old producer is no longer supported.


> connect.timeout.ms seems to do the wrong thing in the producer
> --
>
> Key: KAFKA-610
> URL: https://issues.apache.org/jira/browse/KAFKA-610
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jay Kreps
>
> This setting used to control the socket connection timeout. This is also what 
> the documentation says:
> "the maximum time spent by kafka.producer.SyncProducer trying to connect to 
> the kafka broker. Once it elapses, the producer throws an ERROR and stops."
> But it doesn't look to me that this parameter is being used at all. The only 
> thing we do with it is check in a catch statement if that much time has 
> ellapsed and then throw an error. Since we haven't set the connection timeout 
> this is silly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-599) SimpleConsumerShell ONLY connects to the first host in the broker-list string to fetch topic metadata

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-599.
-
Resolution: Auto Closed

The Scala consumers are no longer supported. 


> SimpleConsumerShell ONLY connects to the first host in the broker-list string 
> to fetch topic metadata
> -
>
> Key: KAFKA-599
> URL: https://issues.apache.org/jira/browse/KAFKA-599
> Project: Kafka
>  Issue Type: Bug
>Reporter: John Fung
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-435) Keep track of the transient test failure for Kafka-343 on Apache Jenkins

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-435.
-
Resolution: Cannot Reproduce

Closing inactive issue.

> Keep track of the transient test failure for Kafka-343 on Apache Jenkins
> 
>
> Key: KAFKA-435
> URL: https://issues.apache.org/jira/browse/KAFKA-435
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.8.0
>Reporter: Yang Ye
>Assignee: Yang Ye
>Priority: Minor
>  Labels: transient-unit-test-failure
>
> See: 
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-commits/201208.mbox/browser
> Error message:
> --
> [...truncated 3415 lines...]
> [2012-08-01 17:27:08,432] ERROR KafkaApi on Broker 0, error when processing 
> request (test_topic,0,-1,1048576)
> (kafka.server.KafkaApis:99)
> kafka.common.OffsetOutOfRangeException: offset -1 is out of range
>   at kafka.log.Log$.findRange(Log.scala:46)
>   at kafka.log.Log.read(Log.scala:265)
>   at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSet(KafkaApis.scala:377)
>   at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1$$anonfun$apply$21.apply(KafkaApis.scala:333)
>   at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1$$anonfun$apply$21.apply(KafkaApis.scala:332)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
>   at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:332)
>   at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:328)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32)
>   at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSets(KafkaApis.scala:328)
>   at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:272)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:59)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:38)
>   at java.lang.Thread.run(Thread.java:662)
> [2012-08-01 17:27:08,446] ERROR Closing socket for /67.195.138.9 because of 
> error (kafka.network.Processor:99)
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcher.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:171)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>   at kafka.utils.Utils$.read(Utils.scala:630)
>   at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>   at kafka.network.Processor.read(SocketServer.scala:296)
>   at kafka.network.Processor.run(SocketServer.scala:212)
>   at java.lang.Thread.run(Thread.java:662)
> [info] Test Passed: 
> testResetToEarliestWhenOffsetTooLow(kafka.integration.AutoOffsetResetTest)
> [info] Test Starting: 
> testResetToLatestWhenOffsetTooHigh(kafka.integration.AutoOffsetResetTest)
> [2012-08-01 17:27:09,203] ERROR KafkaApi on Broker 0, error when processing 
> request (test_topic,0,1,1048576)
> (kafka.server.KafkaApis:99)
> kafka.common.OffsetOutOfRangeException: offset 1 is out of range
>   at kafka.log.Log$.findRange(Log.scala:46)
>   at kafka.log.Log.read(Log.scala:265)
>   at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSet(KafkaApis.scala:377)
>   at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1$$anonfun$apply$21.apply(KafkaApis.scala:333)
>   at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1$$anonfun$apply$21.apply(KafkaApis.scala:332)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
>   at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:332)
>   at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:328)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32)
>   at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSets(KafkaApis.scala:328)
>   at 

[jira] [Resolved] (KAFKA-354) Refactor getter and setter API to conform to the new convention

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-354.
-
Resolution: Won't Fix

Closing inactive issue.

> Refactor getter and setter API to conform to the new convention
> ---
>
> Key: KAFKA-354
> URL: https://issues.apache.org/jira/browse/KAFKA-354
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.8.0
>Reporter: Neha Narkhede
>Assignee: Joe Stein
>  Labels: optimization
>
> We just agreed on a new convention for getter/setter APIs. It will be good to 
> refactor code to conform to that.
> > We can actually go with public vals or vars - there is not much point in
> > defining a custom getter/setter as that is redundant.
> >
> > For example:
> > - start with "val x"
> > - over time, we determine that it needs to be mutable - change it to "var
> > x"
> > - if you need something more custom (e.g., enforce constraints on the
> > values that you can assign) then we can add the custom setter
> >  private[this] var underyling: T = ...
> >  def  x = underlying
> >  def x_=(update: T)  { if (constraint satisfied) {underlying = update}
> > else {throw new Exception} }
> >
> > All of the above changes will be binary compatible since under the covers,
> > reads/assignments are all through getter/setter methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-407) Uncaught InputStream.close() exception in CompressionUtils.compress()

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-407.
-
Resolution: Won't Fix

This part of the code is removed. So closing the issue. 

> Uncaught InputStream.close() exception in CompressionUtils.compress()
> -
>
> Key: KAFKA-407
> URL: https://issues.apache.org/jira/browse/KAFKA-407
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.7.1
>Reporter: Lorenzo Alberton
>Priority: Minor
>
> In CompressionUtils.compress(), in this try/catch block:
> ==
> try {
>   cf.write(messageByteBuffer.array)
> } catch {
>   case e: IOException => error("Error while writing to the GZIP output 
> stream", e)
>   cf.close()
>   throw e
> } finally {
>   cf.close()
> }
> ==
> cf.close() might throw an IOException, which is not handled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (KAFKA-3917) Some __consumer_offsets replicas grow way too big

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reopened KAFKA-3917:


If you're running 0.10.2.0, then potentially KAFKA-5413, but I'll reopen until 
we can verify.

> Some __consumer_offsets replicas grow way too big
> -
>
> Key: KAFKA-3917
> URL: https://issues.apache.org/jira/browse/KAFKA-3917
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0
> Environment: Runs with Docker 1.10.1 in a container on 
> Linux 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64
>Reporter: Maxim Vladimirskiy
>  Labels: reliability
>
> We noticed that some replicas of partitions of the __consumer_offsets topic 
> grow way too big. Looking inside respective folders it became apparent that 
> old segments had not been cleaned up. Please see below example of disk usage 
> data for both affected and not affected partitions:
> Not affected partitions:
> Partition: 0  Leader: 2   Replicas: 2,3,4 Isr: 2,4,3
> 2: 49M
> 3: 49M
> 4: 49M
> Affected partitions:
> Partition: 10 Leader: 2   Replicas: 2,0,1 Isr: 1,2,0
> 0: 86M
> 1: 22G <<< too big!
> 2: 86M
> Partition: 38 Leader: 0   Replicas: 0,4,1 Isr: 1,0,4
> 0: 43M
> 1: 26G <<<  too big!
> 4: 26G <<<  too big!
> As you can see sometimes only one replica is affected, sometimes both 
> replicas are affected.
> When I try to restart a broker that has affected replicas it fails to start 
> with an exception that looks like this:
> [2016-06-28 23:15:20,441] ERROR There was an error in one of the threads 
> during logs loading: java.lang.IllegalArgumentException: requirement failed: 
> Corrupt index found, index file 
> (/var/kafka/__consumer_offsets-38/.index) has non-zero 
> size but the last offset is -676703869 and the base offset is 0 
> (kafka.log.LogManager)
> [2016-06-28 23:15:20,442] FATAL [Kafka Server 1], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> java.lang.IllegalArgumentException: requirement failed: Corrupt index found, 
> index file (/var/kafka/__consumer_offsets-38/.index) has 
> non-zero size but the last offset is -676703869 and the base offset is 0
> at scala.Predef$.require(Predef.scala:233)
> at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352)
> at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:184)
> at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:183)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.log.Log.loadSegments(Log.scala:183)
> at kafka.log.Log.(Log.scala:67)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:142)
> at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> After the content of the affected partition is deleted broker starts 
> successfully. 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-3917) Some __consumer_offsets replicas grow way too big

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3917:
---
Affects Version/s: (was: 0.8.2.2)
   0.10.2.0

> Some __consumer_offsets replicas grow way too big
> -
>
> Key: KAFKA-3917
> URL: https://issues.apache.org/jira/browse/KAFKA-3917
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0
> Environment: Runs with Docker 1.10.1 in a container on 
> Linux 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64
>Reporter: Maxim Vladimirskiy
>  Labels: reliability
>
> We noticed that some replicas of partitions of the __consumer_offsets topic 
> grow way too big. Looking inside respective folders it became apparent that 
> old segments had not been cleaned up. Please see below example of disk usage 
> data for both affected and not affected partitions:
> Not affected partitions:
> Partition: 0  Leader: 2   Replicas: 2,3,4 Isr: 2,4,3
> 2: 49M
> 3: 49M
> 4: 49M
> Affected partitions:
> Partition: 10 Leader: 2   Replicas: 2,0,1 Isr: 1,2,0
> 0: 86M
> 1: 22G <<< too big!
> 2: 86M
> Partition: 38 Leader: 0   Replicas: 0,4,1 Isr: 1,0,4
> 0: 43M
> 1: 26G <<<  too big!
> 4: 26G <<<  too big!
> As you can see sometimes only one replica is affected, sometimes both 
> replicas are affected.
> When I try to restart a broker that has affected replicas it fails to start 
> with an exception that looks like this:
> [2016-06-28 23:15:20,441] ERROR There was an error in one of the threads 
> during logs loading: java.lang.IllegalArgumentException: requirement failed: 
> Corrupt index found, index file 
> (/var/kafka/__consumer_offsets-38/.index) has non-zero 
> size but the last offset is -676703869 and the base offset is 0 
> (kafka.log.LogManager)
> [2016-06-28 23:15:20,442] FATAL [Kafka Server 1], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> java.lang.IllegalArgumentException: requirement failed: Corrupt index found, 
> index file (/var/kafka/__consumer_offsets-38/.index) has 
> non-zero size but the last offset is -676703869 and the base offset is 0
> at scala.Predef$.require(Predef.scala:233)
> at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352)
> at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:184)
> at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:183)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.log.Log.loadSegments(Log.scala:183)
> at kafka.log.Log.(Log.scala:67)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:142)
> at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> After the content of the affected partition is deleted broker starts 
> successfully. 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2580) Kafka Broker keeps file handles open for all log files (even if its not written to/read from)

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2580:
---
Affects Version/s: (was: 0.8.2.1)
   1.0.0
   0.10.0.0
   0.11.0.0

> Kafka Broker keeps file handles open for all log files (even if its not 
> written to/read from)
> -
>
> Key: KAFKA-2580
> URL: https://issues.apache.org/jira/browse/KAFKA-2580
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: Vinoth Chandar
>
> We noticed this in one of our clusters where we stage logs for a longer 
> amount of time. It appears that the Kafka broker keeps file handles open even 
> for non active (not written to or read from) files. (in fact, there are some 
> threads going back to 2013 
> http://grokbase.com/t/kafka/users/132p65qwcn/keeping-logs-forever) 
> Needless to say, this is a problem and forces us to either artificially bump 
> up ulimit (its already at 100K) or expand the cluster (even if we have 
> sufficient IO and everything). 
> Filing this ticket, since I could find anything similar. Very interested to 
> know if there are plans to address this (given how Samza's changelog topic is 
> meant to be a persistent large state use case).  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2588) ReplicaManager partitionCount metric should actually be replicaCount

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2588:
---
Affects Version/s: (was: 0.8.2.1)
   1.0.0
   0.10.0.0
   0.11.0.0

> ReplicaManager partitionCount metric should actually be replicaCount
> 
>
> Key: KAFKA-2588
> URL: https://issues.apache.org/jira/browse/KAFKA-2588
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: Grant Henke
>Assignee: Grant Henke
>  Labels: needs-kip
>
> The metrics "partitionCount" in the ReplicaManager actually represents the 
> count of replicas. 
> As an example if I have a cluster with 1 topic with 1 partitions and a 
> replication factor of 3. The metric (aggregated) would show a value of 3. 
> There is a metric called "LeaderCount" that actually represents the 
> "partitionCount". In my example above the metric (aggregated) would show a 
> value of 1. 
> We do need to consider compatibility of consuming systems. I think the most 
> simple change would be to:
> - Adjust the "partitionCount" metric to be the same value as "LeaderCount"
> - Add a "replicaCount" metric which contains the values "partitionCount" does 
> today
> - Leave "LeaderCount" in for compatibility
> Documentation will need to be updated as well. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2588) ReplicaManager partitionCount metric should actually be replicaCount

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2588:
---
Labels: needs-kip  (was: )

> ReplicaManager partitionCount metric should actually be replicaCount
> 
>
> Key: KAFKA-2588
> URL: https://issues.apache.org/jira/browse/KAFKA-2588
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: Grant Henke
>Assignee: Grant Henke
>  Labels: needs-kip
>
> The metrics "partitionCount" in the ReplicaManager actually represents the 
> count of replicas. 
> As an example if I have a cluster with 1 topic with 1 partitions and a 
> replication factor of 3. The metric (aggregated) would show a value of 3. 
> There is a metric called "LeaderCount" that actually represents the 
> "partitionCount". In my example above the metric (aggregated) would show a 
> value of 1. 
> We do need to consider compatibility of consuming systems. I think the most 
> simple change would be to:
> - Adjust the "partitionCount" metric to be the same value as "LeaderCount"
> - Add a "replicaCount" metric which contains the values "partitionCount" does 
> today
> - Leave "LeaderCount" in for compatibility
> Documentation will need to be updated as well. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2588) ReplicaManager partitionCount metric should actually be replicaCount

2017-10-30 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224791#comment-16224791
 ] 

Ismael Juma commented on KAFKA-2588:


A KIP would be required for this.

> ReplicaManager partitionCount metric should actually be replicaCount
> 
>
> Key: KAFKA-2588
> URL: https://issues.apache.org/jira/browse/KAFKA-2588
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: Grant Henke
>Assignee: Grant Henke
>  Labels: needs-kip
>
> The metrics "partitionCount" in the ReplicaManager actually represents the 
> count of replicas. 
> As an example if I have a cluster with 1 topic with 1 partitions and a 
> replication factor of 3. The metric (aggregated) would show a value of 3. 
> There is a metric called "LeaderCount" that actually represents the 
> "partitionCount". In my example above the metric (aggregated) would show a 
> value of 1. 
> We do need to consider compatibility of consuming systems. I think the most 
> simple change would be to:
> - Adjust the "partitionCount" metric to be the same value as "LeaderCount"
> - Add a "replicaCount" metric which contains the values "partitionCount" does 
> today
> - Leave "LeaderCount" in for compatibility
> Documentation will need to be updated as well. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2607) Review `Time` interface and its usage

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2607:
---
Affects Version/s: (was: 0.8.2.2)
   1.0.0
   0.11.0.0

> Review `Time` interface and its usage
> -
>
> Key: KAFKA-2607
> URL: https://issues.apache.org/jira/browse/KAFKA-2607
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.11.0.0, 1.0.0
>Reporter: Ismael Juma
>  Labels: newbie
>
> Two of `Time` interface's methods are `milliseconds` and `nanoseconds` which 
> are implemented in `SystemTime` as follows:
> {code}
> @Override
> public long milliseconds() {
> return System.currentTimeMillis();
> }
> @Override
> public long nanoseconds() {
> return System.nanoTime();
> }
> {code}
> The issue with this interface is that it makes it seem that the difference is 
> about the unit (`ms` versus `ns`) whereas it's much more than that:
> https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks
> We should probably change the names of the methods and review our usage to 
> see if we're using the right one in the various places.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-2616) Improve Kakfa client exceptions

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-2616.

Resolution: Auto Closed

BlockingChannel is only used by the Scala clients. which are no longer 
supported. Please upgrade to the Java clients whenever possible.


> Improve Kakfa client exceptions
> ---
>
> Key: KAFKA-2616
> URL: https://issues.apache.org/jira/browse/KAFKA-2616
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.8.2.1
>Reporter: Hurshal Patel
>Priority: Minor
>
> Any sort of network failure results in a {{java.nio.ClosedChannelException}} 
> which is bubbled up from {{kafka.network.BlockingChannel}}. 
> Displaying such an exception to a user with little knowledge about Kafka can 
> be more confusing than informative. A better user experience for the Kafka 
> consumer would be to throw a more appropriately named exception when a 
> {{ClosedChannelException}} is encountered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2605) Replace `catch: Throwable` clauses with `NonFatal` or `NonControl`

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2605:
---
Affects Version/s: (was: 0.8.2.2)
   1.0.0
   0.11.0.0

> Replace `catch: Throwable` clauses with `NonFatal` or `NonControl`
> --
>
> Key: KAFKA-2605
> URL: https://issues.apache.org/jira/browse/KAFKA-2605
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.11.0.0, 1.0.0
>Reporter: Ismael Juma
>Assignee: jin xing
>  Labels: newbie
>
> The Kafka codebase includes a number of instances where we do `catch t: 
> Throwable` where we should really be doing `catch NonFatal(t)` or `catch 
> NonControl(t)` where `NonFatal` is part of the standard library and 
> `NonControl` is something like:
> {code}
> object NonControl {
>def apply(t: Throwable): Boolean = t match {
>  case _: ControlThrowable => false
>  case _ => true
>}
>   def unapply(t: Throwable): Option[Throwable] = if (apply(t)) Some(t) else 
> None
> }
> {code}
> We can also use `NonControl` to replace cases like (it's more concise and has 
> the same behaviour):
> {code}
>   case e: ControlThrowable => throw e
>   case e: Throwable => ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-2636) Producer connectivity obscured connection failure logging

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-2636.

Resolution: Not A Problem

The producer should throw a TimeoutException to the callback or if `Future.get` 
is called.

> Producer connectivity obscured connection failure logging
> -
>
> Key: KAFKA-2636
> URL: https://issues.apache.org/jira/browse/KAFKA-2636
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1, 0.8.2.2
> Environment: Windows 8 running java implementation of Kafka Producer
>Reporter: Jason Kania
>
> The Kafka Producer does not generate a visible exception when a connection 
> cannot be made. Instead DEBUG settings are required to observe the problem as 
> shown below:
> [2015-10-12 21:23:20,335] DEBUG Error connecting to node 0 at 
> 482f4769eed1:9092: (org.apache.kafka.clients.NetworkClient)
> java.io.IOException: Can't resolve address: 482f4769eed1:9092
>   at org.apache.kafka.common.network.Selector.connect(Selector.java:138)
>   at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:417)
>   at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:116)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:165)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
>   at java.lang.Thread.run(Unknown Source)
> Caused by: java.nio.channels.UnresolvedAddressException
>   at sun.nio.ch.Net.checkAddress(Unknown Source)
>   at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
>   at org.apache.kafka.common.network.Selector.connect(Selector.java:135)
>   ... 5 more
> [2015-10-12 21:23:20,358] DEBUG Initiating connection to node 0 at 
> 482f4769eed1:9092. (org.apache.kafka.clients.NetworkClient)
> Secondly, the errors do not identify the node by IP address making error 
> investigation more difficult especially when learning to use Kafka.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2561) Optionally support OpenSSL for SSL/TLS

2017-10-30 Thread jaikiran pai (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224784#comment-16224784
 ] 

jaikiran pai commented on KAFKA-2561:
-

I just came across this JIRA, so I thought I will update it with my own recent 
experiments with OpenSSL (Java 8) and Kafka. For those interested, I got some 
performance numbers OpenSSL (backed by WildFly OpenSSL Java bindings) and have 
detailed them in my blog[1]. Later this week, I plan to rerun the same thing 
with Java 9 and see how it performs.

[1] https://jaitechwriteups.blogspot.com/2017/10/kafka-with-openssl.html


> Optionally support OpenSSL for SSL/TLS 
> ---
>
> Key: KAFKA-2561
> URL: https://issues.apache.org/jira/browse/KAFKA-2561
> Project: Kafka
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Ismael Juma
>
> JDK's `SSLEngine` is unfortunately a bit slow (KAFKA-2431 covers this in more 
> detail). We should consider supporting OpenSSL for SSL/TLS. Initial 
> experiments on my laptop show that it performs a lot better:
> {code}
> start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
> nMsg.sec, config
> 2015-09-21 14:41:58:245, 2015-09-21 14:47:02:583, 28610.2295, 94.0081, 
> 3000, 98574.6111, Java 8u60/server auth JDK 
> SSLEngine/TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
> 2015-09-21 14:38:24:526, 2015-09-21 14:40:19:941, 28610.2295, 247.8900, 
> 3000, 259931.5514, Java 8u60/server auth 
> OpenSslEngine/TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
> 2015-09-21 14:49:03:062, 2015-09-21 14:50:27:764, 28610.2295, 337.7751, 
> 3000, 354182.9000, Java 8u60/plaintext
> {code}
> Extracting the throughput figures:
> * JDK SSLEngine: 94 MB/s
> * OpenSSL SSLEngine: 247 MB/s
> * Plaintext: 337 MB/s (code from trunk, so no zero-copy due to KAFKA-2517)
> In order to get these figures, I used Netty's `OpenSslEngine` by hacking 
> `SSLFactory` to use Netty's `SslContextBuilder` and made a few changes to 
> `SSLTransportLayer` in order to workaround differences in behaviour between 
> `OpenSslEngine` and JDK's SSLEngine (filed 
> https://github.com/netty/netty/issues/4235 and 
> https://github.com/netty/netty/issues/4238 upstream).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-2662) Make ConsumerIterator thread-safe for multiple threads in different Kafka groups

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-2662.

Resolution: Auto Closed

The Scala consumers are no longer supported. Please upgrade to the Java 
consumer whenever possible.

> Make ConsumerIterator thread-safe for multiple threads in different Kafka 
> groups
> 
>
> Key: KAFKA-2662
> URL: https://issues.apache.org/jira/browse/KAFKA-2662
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.8.2.1
>Reporter: Andrew Pennebaker
>Assignee: Neha Narkhede
>
> The API for obtaining a ConsumerIterator requires a group parameter, implying 
> that ConsumerIterators are thread-safe, as long as each thread is in a 
> different Kafka group. However, in practice, attempting to call hasNext() on 
> ConsumerIterators for a thread in one group, and for a thread in another 
> group, results in an InvalidStateException.
> In the future, can we please make ConsumerIterator thread-safe, for a common 
> use case of one consumer thread per group?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-2725) high level consumer rebalances with auto-commit disabled should throw an exception

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-2725.

Resolution: Auto Closed

The Scala consumers are no longer supported. If this still applies to the Java 
consumer, please file a new issue.

> high level consumer rebalances with auto-commit disabled should throw an 
> exception
> --
>
> Key: KAFKA-2725
> URL: https://issues.apache.org/jira/browse/KAFKA-2725
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.8.2.1
> Environment: Experienced on Java running in linux
>Reporter: Cliff Rhyne
>
> Auto-commit is a very resilient mode.  Drops in zookeeper sessions due to JVM 
> garbage collection, network, rebalance or other interference are handled 
> gracefully within the kafka client.
> Systems still can drop due to unexpected gc or network behavior.  My proposal 
> is to handle this drop better when auto-commit is turned off:
> - If a rebalance or similar occur (which cause the offset to get reverted in 
> the client), check and see if the client was assigned back to the same 
> partition or a different one.  If it's the same partition, find the place 
> last consumed (it doesn't do this today for us).  This is to make a graceful 
> recovery.
> - If the partition assignment changes (which can mean duplicate data is 
> getting processed), throw an exception back to the application code.  This 
> lets the application code handle this exception-case with respect to the work 
> it's doing (with might be transactional).  Failing "silently" (yes it's still 
> getting logged) is very dangerous in our situation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2787) Refactor gradle build

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2787:
---
Affects Version/s: (was: 0.8.2.2)

> Refactor gradle build
> -
>
> Key: KAFKA-2787
> URL: https://issues.apache.org/jira/browse/KAFKA-2787
> Project: Kafka
>  Issue Type: Task
>  Components: build
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Minor
>
> The build files are quite large with a lot of duplication and overlap. This 
> could lead to mistakes, reduce readability and functionality, and hinder 
> future changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2898) Kafka producer config incorrectly warns that serializer.* isn't a known config

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2898:
---
Affects Version/s: (was: 0.8.2.1)
   1.0.0
   0.10.0.0
   0.11.0.0

> Kafka producer config incorrectly warns that serializer.* isn't a known config
> --
>
> Key: KAFKA-2898
> URL: https://issues.apache.org/jira/browse/KAFKA-2898
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: Guillaume Grossetie
>Assignee: Lee Dongjin
>Priority: Minor
>
> When using "serializer.encoding", "key.serializer.encoding" or 
> "value.serializer.encoding" to configure the encoding of the serializer, 
> Kafka producer incorrectly warns:
> {code}
> config.put("serializer.encoding", serializerEncoding);
> config.put("key.serializer.encoding", keySerializerEncoding);
> config.put("value.serializer.encoding", valueSerializerEncoding);
> new KafkaProducer<>(config);
> {code}
> {code}
> [main] WARN  o.a.k.c.producer.ProducerConfig - The configuration 
> serializer.encoding = null was supplied but isn't a known config. 
> [main] WARN  o.a.k.c.producer.ProducerConfig - The configuration 
> value.serializer.encoding = null was supplied but isn't a known config. 
> [main] WARN  o.a.k.c.producer.ProducerConfig - The configuration 
> key.serializer.encoding = null was supplied but isn't a known config. 
> {code}
> With the following code I get:
> {code}
> Map serializerConfig = new HashMap<>();
> serializerConfig.put("serializer.encoding", serializerEncoding);
> serializerConfig.put("key.serializer.encoding", keySerializerEncoding);
> serializerConfig.put("value.serializer.encoding", valueSerializerEncoding);
> StringSerializer keySerializer = new StringSerializer();
> keySerializer.configure(serializerConfig, true);
> StringSerializer valueSerializer = new StringSerializer();
> valueSerializer.configure(serializerConfig, false);
> new KafkaProducer<>(config, keySerializer, valueSerializer);
> {code}
> {code}
> [main] WARN  o.a.k.c.producer.ProducerConfig - The configuration 
> value.serializer = class 
> org.apache.kafka.common.serialization.StringSerializer was supplied but isn't 
> a known config. 
> [main] WARN  o.a.k.c.producer.ProducerConfig - The configuration 
> key.serializer = class org.apache.kafka.common.serialization.StringSerializer 
> was supplied but isn't a known config. 
> {code}
> However I think this case is now fixed in master because someone has added:
> {code}
> config.ignore(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG);
> config.ignore(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG);
> {code}
> I think "serializer.encoding", "key.serializer.encoding" and 
> "value.serializer.encoding" configs should also be ignored because they are 
> used in StringSerializer (using a copy of the config: "config.originals()")



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2903) FileRecords.read doesn't handle size > sizeInBytes when start is not zero

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2903:
---
Fix Version/s: (was: 1.0.0)
   1.1.0

> FileRecords.read doesn't handle size > sizeInBytes when start is not zero
> -
>
> Key: KAFKA-2903
> URL: https://issues.apache.org/jira/browse/KAFKA-2903
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0, 0.11.0.0, 1.0.0
>Reporter: Pengwei
>Assignee: Jay Kreps
> Fix For: 1.1.0
>
>
> now the code is :
> def read(position: Int, size: Int): FileMessageSet = {
>. 
> new FileMessageSet(file,
>channel,
>start = this.start + position,
>end = math.min(this.start + position + size, 
> sizeInBytes()))
>   }
> if this.start is not 0, the end is only the FileMessageSet's size, not the 
> actually position of end position.
> the end parameter should be:
>  end = math.min(this.start + position + size, this.start+sizeInBytes())
> *Update* Problem still remains in FileRecords.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2903) FileRecords.read doesn't handle size > sizeInBytes when start is not zero

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224767#comment-16224767
 ] 

ASF GitHub Bot commented on KAFKA-2903:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/4158

KAFKA-2903: FileRecords.read doesn't handle size > sizeInBytes when start 
is not zero



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-2903-file-records-read-slice-size-greater

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4158


commit b251038e2c569fda88376b85918a5003927b8152
Author: Ismael Juma 
Date:   2017-10-30T11:45:00Z

KAFKA-2903: FileRecords.read doesn't handle size > sizeInBytes when start 
is not zero




> FileRecords.read doesn't handle size > sizeInBytes when start is not zero
> -
>
> Key: KAFKA-2903
> URL: https://issues.apache.org/jira/browse/KAFKA-2903
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0, 0.11.0.0, 1.0.0
>Reporter: Pengwei
>Assignee: Jay Kreps
>
> now the code is :
> def read(position: Int, size: Int): FileMessageSet = {
>. 
> new FileMessageSet(file,
>channel,
>start = this.start + position,
>end = math.min(this.start + position + size, 
> sizeInBytes()))
>   }
> if this.start is not 0, the end is only the FileMessageSet's size, not the 
> actually position of end position.
> the end parameter should be:
>  end = math.min(this.start + position + size, this.start+sizeInBytes())
> *Update* Problem still remains in FileRecords.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6042) Kafka Request Handler deadlocks and brings down the cluster.

2017-10-30 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224750#comment-16224750
 ] 

Ismael Juma commented on KAFKA-6042:


[~corlettb], it's unrelated to this issue so best to discuss it in the mailing 
list. From the stacktrace, it can be seen that the message is being down 
converted (i.e. the broker is using a newer message format than is supported by 
an older client). This is generally to be avoided if possible and can lead to 
memory issues like the one you are seeing. The upgrade notes discuss this in 
detail (search for message.format.version). Hope this helps.

> Kafka Request Handler deadlocks and brings down the cluster.
> 
>
> Key: KAFKA-6042
> URL: https://issues.apache.org/jira/browse/KAFKA-6042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 1.0.0
> Environment: kafka version: 0.11.0.1
> client versions: 0.8.2.1-0.10.2.1
> platform: aws (eu-west-1a)
> nodes: 36 x r4.xlarge
> disk storage: 2.5 tb per node (~73% usage per node)
> topics: 250
> number of partitions: 48k (approx)
> os: ubuntu 14.04
> jvm: Java(TM) SE Runtime Environment (build 1.8.0_131-b11), Java HotSpot(TM) 
> 64-Bit Server VM (build 25.131-b11, mixed mode)
>Reporter: Ben Corlett
>Assignee: Rajini Sivaram
>Priority: Blocker
> Fix For: 1.0.0, 0.11.0.2
>
> Attachments: thread_dump.txt.gz
>
>
> We have been experiencing a deadlock that happens on a consistent server 
> within our cluster. This happens multiple times a week currently. It first 
> started happening when we upgraded to 0.11.0.0. Sadly 0.11.0.1 failed to 
> resolve the issue.
> Sequence of events:
> At a seemingly random time broker 125 goes into a deadlock. As soon as it is 
> deadlocked it will remove all the ISR's for any partition is its the leader 
> for.
> [2017-10-10 00:06:10,061] INFO Partition [XX,24] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,073] INFO Partition [XX,974] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,079] INFO Partition [XX,64] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,081] INFO Partition [XX,21] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,084] INFO Partition [XX,12] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,085] INFO Partition [XX,61] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,086] INFO Partition [XX,53] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,088] INFO Partition [XX,27] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,090] INFO Partition [XX,182] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> [2017-10-10 00:06:10,091] INFO Partition [XX,16] on broker 125: 
> Shrinking ISR from 117,125 to 125 (kafka.cluster.Partition)
> 
> The other nodes fail to connect to the node 125 
> [2017-10-10 00:08:42,318] WARN [ReplicaFetcherThread-0-125]: Error in fetch 
> to broker 125, request (type=FetchRequest, replicaId=101, maxWait=500, 
> minBytes=1, maxBytes=10485760, fetchData={XX-94=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-22=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-58=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-11=(offset=78932482, 
> logStartOffset=50881481, maxBytes=1048576), XX-55=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-19=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-91=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-5=(offset=903857106, 
> logStartOffset=0, maxBytes=1048576), XX-80=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-88=(offset=0, 
> logStartOffset=0, maxBytes=1048576), XX-34=(offset=308, 
> logStartOffset=308, maxBytes=1048576), XX-7=(offset=369990, 
> logStartOffset=369990, maxBytes=1048576), XX-0=(offset=57965795, 
> logStartOffset=0, maxBytes=1048576)}) (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 125 was disconnected before the response 
> was read
> at 
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:93)
> at 
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:93)
> at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:207)
> at 
> 

[jira] [Commented] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption

2017-10-30 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224722#comment-16224722
 ] 

Ismael Juma commented on KAFKA-5060:


[~rparmentier], can you include consumer logs as well?

> Offset not found while broker is rebuilding its index after an index 
> corruption
> ---
>
> Key: KAFKA-5060
> URL: https://issues.apache.org/jira/browse/KAFKA-5060
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>  Labels: reliability
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/130248110337.index) has non-zero size but 
> the last offset is 130248110337 which is no larger than the base offset 
> 130248110337.}. deleting 
> /var/lib/kafka/my_topic-6/130248110337.timeindex, 
> /var/lib/kafka/my_topic-6/130248110337.index and rebuilding index... 
> (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log 
> segments and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to 
> handle this case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6130) VerifiableConsumer with --max-messages doesn't exit

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224708#comment-16224708
 ] 

ASF GitHub Bot commented on KAFKA-6130:
---

GitHub user tombentley opened a pull request:

https://github.com/apache/kafka/pull/4157

KAFKA-6130: VerifiableConsumer with --max-messages

@hachikuji can you review this? Do you know the best place to write a test, 
since there doesn't seem to be a unit test for the `VerifiableConsumer`?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tombentley/kafka 
KAFKA-6130-verifiable-consumer-max-messages

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4157


commit 7ee4341babbf9616594169cf6c5cf2d093870d4e
Author: Tom Bentley 
Date:   2017-10-26T10:41:15Z

KAFKA-6130: VerifiableConsumer with --max-messages




> VerifiableConsumer with --max-messages doesn't exit
> ---
>
> Key: KAFKA-6130
> URL: https://issues.apache.org/jira/browse/KAFKA-6130
> Project: Kafka
>  Issue Type: Bug
>Reporter: Tom Bentley
>Assignee: Tom Bentley
>Priority: Minor
>
> If I run {{kafka-verifiable-consumer.sh --max-messages=N}} I expect the tool 
> to consume N messages and then exit. It will actually consume as many 
> messages as are in the topic and then block.
> The problem is that although  the max messages will cause the loop in 
> onRecordsReceived() to break, the loop in run() will just call 
> onRecordsReceived() again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6042) Kafka Request Handler deadlocks and brings down the cluster.

2017-10-30 Thread Ben Corlett (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224617#comment-16224617
 ] 

Ben Corlett commented on KAFKA-6042:


We had another issue on broker 125 again. Not the same this time. All other 
nodes in the cluster are on 0.11.0.1. Broker 125 is running a build from:

https://github.com/apache/kafka/commits/2a321941387c7739f2fbbbe592d017b703223ada

It ran out of heap space. We are currently running a heap of 5GB. This is the 
first time we've seen an out of heap issue with kafka. I don't know if this is 
related to this issue. It might be that 1.0 requires more heap space or that 
running mixed versions uses more heap.

This issue affected the entire cluster and messages rates didn't go back to 
normal until broker 125 was restarted.

I can increase the heap size.

{code}
[2017-10-28 16:19:31,061] ERROR [KafkaApi-125] Error when handling request 
{replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,topics=[{topic=X,partitions=[{partition=40,fetch_offset=153707886,max_bytes=1048576}]}]}
 (kafka.server.KafkaApis)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at 
org.apache.kafka.common.record.AbstractRecords.downConvert(AbstractRecords.java:101)
at 
org.apache.kafka.common.record.FileRecords.downConvert(FileRecords.java:253)
at 
kafka.server.KafkaApis.$anonfun$handleFetchRequest$4(KafkaApis.scala:520)
at 
kafka.server.KafkaApis.$anonfun$handleFetchRequest$4$adapted(KafkaApis.scala:518)
at kafka.server.KafkaApis$$Lambda$837/843104331.apply(Unknown Source)
at scala.Option.map(Option.scala:146)
at 
kafka.server.KafkaApis.$anonfun$handleFetchRequest$3(KafkaApis.scala:518)
at 
kafka.server.KafkaApis.$anonfun$handleFetchRequest$3$adapted(KafkaApis.scala:508)
at kafka.server.KafkaApis$$Lambda$836/1538921035.apply(Unknown Source)
at scala.Option.flatMap(Option.scala:171)
at kafka.server.KafkaApis.convertedPartitionData$1(KafkaApis.scala:508)
at 
kafka.server.KafkaApis.$anonfun$handleFetchRequest$12(KafkaApis.scala:556)
at kafka.server.KafkaApis$$Lambda$833/1032345356.apply(Unknown Source)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
at scala.collection.IterableLike.foreach(IterableLike.scala:71)
at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.server.KafkaApis.createResponse$2(KafkaApis.scala:555)
at 
kafka.server.KafkaApis.$anonfun$handleFetchRequest$14(KafkaApis.scala:569)
at 
kafka.server.KafkaApis.$anonfun$handleFetchRequest$14$adapted(KafkaApis.scala:569)
at kafka.server.KafkaApis$$Lambda$844/44004770.apply(Unknown Source)
at 
kafka.server.KafkaApis.$anonfun$sendResponseMaybeThrottle$1(KafkaApis.scala:2034)
at kafka.server.KafkaApis$$Lambda$439/940799008.apply$mcVI$sp(Unknown 
Source)
at 
kafka.server.ClientRequestQuotaManager.maybeRecordAndThrottle(ClientRequestQuotaManager.scala:52)
at 
kafka.server.KafkaApis.sendResponseMaybeThrottle(KafkaApis.scala:2034)
at kafka.server.KafkaApis.fetchResponseCallback$1(KafkaApis.scala:569)
at 
kafka.server.KafkaApis.$anonfun$handleFetchRequest$15(KafkaApis.scala:588)
at kafka.server.KafkaApis$$Lambda$843/1757998472.apply$mcVI$sp(Unknown 
Source)
{code}

{code}
2017-10-28T16:19:31.207+0100: 439215.093: [GC pause (G1 Evacuation Pause) 
(young), 0.0025596 secs]
   [Parallel Time: 1.3 ms, GC Workers: 4]
  [GC Worker Start (ms): Min: 439215093.0, Avg: 439215093.0, Max: 
439215093.0, Diff: 0.0]
  [Ext Root Scanning (ms): Min: 0.6, Avg: 0.6, Max: 0.7, Diff: 0.1, Sum: 
2.5]
  [Update RS (ms): Min: 0.0, Avg: 0.4, Max: 0.5, Diff: 0.5, Sum: 1.6]
 [Processed Buffers: Min: 1, Avg: 1.2, Max: 2, Diff: 1, Sum: 5]
  [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
  [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 
0.0]
  [Object Copy (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.3]
  [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, Sum: 0.4]
 [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
  [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
  [GC Worker Total (ms): Min: 1.2, Avg: 1.2, Max: 1.2, Diff: 0.0, Sum: 4.9]
  [GC Worker End (ms): Min: 439215094.2, Avg: 439215094.2, Max: 
439215094.2, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.1 ms]
   [Other: 1.0 ms]
  [Choose CSet: 0.0 ms]
  

[jira] [Commented] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption

2017-10-30 Thread Romaric Parmentier (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224570#comment-16224570
 ] 

Romaric Parmentier commented on KAFKA-5060:
---

Hi Ismael,

Yes we are reporting the same issue.

Unfortunately, when I was starting my consumer, the offset was simply not found 
and the rule defined by the option "auto.offset.reset" was applied. The best 
option to avoid any problem is to defined this option to "none" once the 
consumer has already been started once.

> Offset not found while broker is rebuilding its index after an index 
> corruption
> ---
>
> Key: KAFKA-5060
> URL: https://issues.apache.org/jira/browse/KAFKA-5060
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>  Labels: reliability
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/130248110337.index) has non-zero size but 
> the last offset is 130248110337 which is no larger than the base offset 
> 130248110337.}. deleting 
> /var/lib/kafka/my_topic-6/130248110337.timeindex, 
> /var/lib/kafka/my_topic-6/130248110337.index and rebuilding index... 
> (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log 
> segments and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to 
> handle this case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-788) Periodic refresh of topic metadata on the producer doesn't include all topics

2017-10-30 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-788.
-
Resolution: Auto Closed

Closing  inactive issue. The old producer is no longer supported.


> Periodic refresh of topic metadata on the producer doesn't include all topics
> -
>
> Key: KAFKA-788
> URL: https://issues.apache.org/jira/browse/KAFKA-788
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.8.0
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>  Labels: kafka-0.8, p2
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> We added a patch to the producer to refresh the metadata for all topics 
> periodically. However, the producer only does this for the topics in the last 
> batch. But some topics sent by the producer could be low throughput and might 
> not be present in every batch. If we bounce the cluster or if brokers fail 
> and leaders change, the metadata for those low throughput topic is not 
> refreshed by this periodic topic metadata request. The next produce request 
> for those topics have to fail and then a separate metadata request needs to 
> be reissued to handle the produce request. This is especially a problem for 
> the migration tool. So even if the producer had a chance to refresh the 
> metadata when the leader changed, it throws LeaderNotAvailableExceptions much 
> later when it sends a request for that topic. 
> I propose we just fetch data for all topics sent by the producer in the 
> periodic refresh of topic metadata



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2903) FileRecords.read doesn't handle size > sizeInBytes when start is not zero

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2903:
---
Description: 
now the code is :
def read(position: Int, size: Int): FileMessageSet = {
   . 
new FileMessageSet(file,
   channel,
   start = this.start + position,
   end = math.min(this.start + position + size, 
sizeInBytes()))
  }

if this.start is not 0, the end is only the FileMessageSet's size, not the 
actually position of end position.
the end parameter should be:
 end = math.min(this.start + position + size, this.start+sizeInBytes())

*Update* Problem still remains in FileRecords.

  was:
now the code is :
def read(position: Int, size: Int): FileMessageSet = {
   . 
new FileMessageSet(file,
   channel,
   start = this.start + position,
   end = math.min(this.start + position + size, 
sizeInBytes()))
  }

if this.start is not 0, the end is only the FileMessageSet's size, not the 
actually position of end position.
the end parameter should be:
 end = math.min(this.start + position + size, this.start+sizeInBytes())




> FileRecords.read doesn't handle size > sizeInBytes when start is not zero
> -
>
> Key: KAFKA-2903
> URL: https://issues.apache.org/jira/browse/KAFKA-2903
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0, 0.11.0.0, 1.0.0
>Reporter: Pengwei
>Assignee: Jay Kreps
>
> now the code is :
> def read(position: Int, size: Int): FileMessageSet = {
>. 
> new FileMessageSet(file,
>channel,
>start = this.start + position,
>end = math.min(this.start + position + size, 
> sizeInBytes()))
>   }
> if this.start is not 0, the end is only the FileMessageSet's size, not the 
> actually position of end position.
> the end parameter should be:
>  end = math.min(this.start + position + size, this.start+sizeInBytes())
> *Update* Problem still remains in FileRecords.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2903) FileRecords.read doesn't handle size > sizeInBytes when start is not zero

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2903:
---
Summary: FileRecords.read doesn't handle size > sizeInBytes when start is 
not zero  (was: FileMessageSet's read method maybe has problem when start is 
not zero)

> FileRecords.read doesn't handle size > sizeInBytes when start is not zero
> -
>
> Key: KAFKA-2903
> URL: https://issues.apache.org/jira/browse/KAFKA-2903
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0, 0.11.0.0, 1.0.0
>Reporter: Pengwei
>Assignee: Jay Kreps
>
> now the code is :
> def read(position: Int, size: Int): FileMessageSet = {
>. 
> new FileMessageSet(file,
>channel,
>start = this.start + position,
>end = math.min(this.start + position + size, 
> sizeInBytes()))
>   }
> if this.start is not 0, the end is only the FileMessageSet's size, not the 
> actually position of end position.
> the end parameter should be:
>  end = math.min(this.start + position + size, this.start+sizeInBytes())



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-2903) FileMessageSet's read method maybe has problem when start is not zero

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2903:
---
Affects Version/s: (was: 0.8.2.1)
   (was: 0.9.0.0)
   1.0.0
   0.10.2.0
   0.11.0.0

> FileMessageSet's read method maybe has problem when start is not zero
> -
>
> Key: KAFKA-2903
> URL: https://issues.apache.org/jira/browse/KAFKA-2903
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0, 0.11.0.0, 1.0.0
>Reporter: Pengwei
>Assignee: Jay Kreps
>
> now the code is :
> def read(position: Int, size: Int): FileMessageSet = {
>. 
> new FileMessageSet(file,
>channel,
>start = this.start + position,
>end = math.min(this.start + position + size, 
> sizeInBytes()))
>   }
> if this.start is not 0, the end is only the FileMessageSet's size, not the 
> actually position of end position.
> the end parameter should be:
>  end = math.min(this.start + position + size, this.start+sizeInBytes())



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3042:
---
Affects Version/s: (was: 0.8.2.1)
   0.10.0.0

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.10.0.0
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
>Assignee: Dong Lin
>  Labels: reliability
> Fix For: 1.1.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-3049) VerifiableProperties does not respect 'default' properties of underlying java.util.Properties instance

2017-10-30 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3049:
---
Affects Version/s: (was: 0.8.2.1)
   (was: 0.9.0.0)
   (was: 0.8.1.1)
   (was: 0.7)
   1.0.0
   0.10.0.0
   0.11.0.0

> VerifiableProperties does not respect 'default' properties of underlying 
> java.util.Properties instance
> --
>
> Key: KAFKA-3049
> URL: https://issues.apache.org/jira/browse/KAFKA-3049
> Project: Kafka
>  Issue Type: Bug
>  Components: config, core
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: Jeffrey Olchovy
>Priority: Minor
>  Labels: easyfix
>
> When retrieving values from the underlying {{Properties}} instance with the 
> {{getString}}, {{get}}, etc. methods of a {{VerifiableProperties}} 
> instance, a call to the underlying {{Properties.containsKey}} method is made. 
> This method will not search the default properties values of the instance, 
> rendering any default properties defined on the {{Properties}} instance 
> useless.
> A practical example is shown below:
> {noformat}
> // suppose we have a base, default set of properties to supply to all 
> consumer groups
> val baseProps = new Properties
> baseProps.setProperty("zookeeper.connect", "localhost:2181/kafka")
> baseProps.setProperty("zookeeper.connection.timeout.ms", "2000")
> // additional we have discrete properties instances for each consumer group 
> that utilize these defaults
> val groupProps1 = new Properties(baseProps)
> groupProps1.setProperty("group.id", "test-1")
> val groupProps2 = new Properties(baseProps)
> groupProps2.setProperty("group.id", "test-2")
> // when attempting to create an instance of a high-level Consumer with the 
> above properties an exception will be thrown due to the aforementioned 
> problem description
> java.lang.IllegalArgumentException: requirement failed: Missing required 
> property 'zookeeper.connect'
> at scala.Predef$.require(Predef.scala:233)
> at 
> kafka.utils.VerifiableProperties.getString(VerifiableProperties.scala:177)
> at kafka.utils.ZKConfig.(ZkUtils.scala:879)
> at kafka.consumer.ConsumerConfig.(ConsumerConfig.scala:100)
> at kafka.consumer.ConsumerConfig.(ConsumerConfig.scala:104)
> // however, the groupProps instances will return the correct value for 
> "zookeeper.connect" when using `Properties.getProperty`
> assert(groupProps1.getProperty("zookeeper.connect", "localhost:2181/kafka"))
> assert(groupProps2.getProperty("zookeeper.connect", "localhost:2181/kafka"))
> {noformat}
> I believe it is worthwhile for Kafka to respect the default properties 
> feature of {{java.util.Properties}}, and further, that Kafka should 
> discourage the use of the methods on {{Properties}} that are inherited from 
> {{Hashtable}} (e.g. {{containsKey}}). One can argue that 
> {{VerifiableProperties}} is not 'correct' due to this behavior, but a user 
> can always workaround this by creating discrete instances of {{Properties}} 
> with a set of default properties manually added to each instance. However, 
> this is inconvenient and may only encourage the use of the discouraged 
> {{Hashtable}} methods like {{putAll}}.
> Two proposed solutions follow:
> 1. Do not delegate to the {{Properties.containsKey}} method during the 
> invocation of {{VerifiableProperties.containsKey}}. One can use a null check 
> in conjunction with {{getProperty}} in its place.
> 2. Treat the underlying {{Properties}} instance as immutable and assign the 
> result of {{Properties.stringPropertyNames()}} to a member of 
> {{VerifiableProperties}}. One can check this set of known, available property 
> names, which respects the optional default properties, when 
> {{VerifiableProperties.containsKey}} is invoked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-3917) Some __consumer_offsets replicas grow way too big

2017-10-30 Thread Maxim Vladimirskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224456#comment-16224456
 ] 

Maxim Vladimirskiy commented on KAFKA-3917:
---

We are running on v0.10.2.0 now and it is still being reproduced. [~ijuma] 
could you please specify a release it was fixed in?

> Some __consumer_offsets replicas grow way too big
> -
>
> Key: KAFKA-3917
> URL: https://issues.apache.org/jira/browse/KAFKA-3917
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2
> Environment: Runs with Docker 1.10.1 in a container on 
> Linux 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64
>Reporter: Maxim Vladimirskiy
>  Labels: reliability
>
> We noticed that some replicas of partitions of the __consumer_offsets topic 
> grow way too big. Looking inside respective folders it became apparent that 
> old segments had not been cleaned up. Please see below example of disk usage 
> data for both affected and not affected partitions:
> Not affected partitions:
> Partition: 0  Leader: 2   Replicas: 2,3,4 Isr: 2,4,3
> 2: 49M
> 3: 49M
> 4: 49M
> Affected partitions:
> Partition: 10 Leader: 2   Replicas: 2,0,1 Isr: 1,2,0
> 0: 86M
> 1: 22G <<< too big!
> 2: 86M
> Partition: 38 Leader: 0   Replicas: 0,4,1 Isr: 1,0,4
> 0: 43M
> 1: 26G <<<  too big!
> 4: 26G <<<  too big!
> As you can see sometimes only one replica is affected, sometimes both 
> replicas are affected.
> When I try to restart a broker that has affected replicas it fails to start 
> with an exception that looks like this:
> [2016-06-28 23:15:20,441] ERROR There was an error in one of the threads 
> during logs loading: java.lang.IllegalArgumentException: requirement failed: 
> Corrupt index found, index file 
> (/var/kafka/__consumer_offsets-38/.index) has non-zero 
> size but the last offset is -676703869 and the base offset is 0 
> (kafka.log.LogManager)
> [2016-06-28 23:15:20,442] FATAL [Kafka Server 1], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> java.lang.IllegalArgumentException: requirement failed: Corrupt index found, 
> index file (/var/kafka/__consumer_offsets-38/.index) has 
> non-zero size but the last offset is -676703869 and the base offset is 0
> at scala.Predef$.require(Predef.scala:233)
> at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352)
> at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:184)
> at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:183)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.log.Log.loadSegments(Log.scala:183)
> at kafka.log.Log.(Log.scala:67)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:142)
> at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> After the content of the affected partition is deleted broker starts 
> successfully. 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-3917) Some __consumer_offsets replicas grow way too big

2017-10-30 Thread Maxim Vladimirskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224456#comment-16224456
 ] 

Maxim Vladimirskiy edited comment on KAFKA-3917 at 10/30/17 7:44 AM:
-

We are running on v0.10.2.0 now and it is still being reproduced. [~ijuma] 
could you please specify the release it was fixed in?


was (Author: horkhe):
We are running on v0.10.2.0 now and it is still being reproduced. [~ijuma] 
could you please specify a release it was fixed in?

> Some __consumer_offsets replicas grow way too big
> -
>
> Key: KAFKA-3917
> URL: https://issues.apache.org/jira/browse/KAFKA-3917
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2
> Environment: Runs with Docker 1.10.1 in a container on 
> Linux 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64
>Reporter: Maxim Vladimirskiy
>  Labels: reliability
>
> We noticed that some replicas of partitions of the __consumer_offsets topic 
> grow way too big. Looking inside respective folders it became apparent that 
> old segments had not been cleaned up. Please see below example of disk usage 
> data for both affected and not affected partitions:
> Not affected partitions:
> Partition: 0  Leader: 2   Replicas: 2,3,4 Isr: 2,4,3
> 2: 49M
> 3: 49M
> 4: 49M
> Affected partitions:
> Partition: 10 Leader: 2   Replicas: 2,0,1 Isr: 1,2,0
> 0: 86M
> 1: 22G <<< too big!
> 2: 86M
> Partition: 38 Leader: 0   Replicas: 0,4,1 Isr: 1,0,4
> 0: 43M
> 1: 26G <<<  too big!
> 4: 26G <<<  too big!
> As you can see sometimes only one replica is affected, sometimes both 
> replicas are affected.
> When I try to restart a broker that has affected replicas it fails to start 
> with an exception that looks like this:
> [2016-06-28 23:15:20,441] ERROR There was an error in one of the threads 
> during logs loading: java.lang.IllegalArgumentException: requirement failed: 
> Corrupt index found, index file 
> (/var/kafka/__consumer_offsets-38/.index) has non-zero 
> size but the last offset is -676703869 and the base offset is 0 
> (kafka.log.LogManager)
> [2016-06-28 23:15:20,442] FATAL [Kafka Server 1], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> java.lang.IllegalArgumentException: requirement failed: Corrupt index found, 
> index file (/var/kafka/__consumer_offsets-38/.index) has 
> non-zero size but the last offset is -676703869 and the base offset is 0
> at scala.Predef$.require(Predef.scala:233)
> at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352)
> at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:184)
> at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:183)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.log.Log.loadSegments(Log.scala:183)
> at kafka.log.Log.(Log.scala:67)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:142)
> at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> After the content of the affected partition is deleted broker starts 
> successfully. 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)