[jira] [Commented] (KAFKA-5663) LogDirFailureTest system test fails

2017-08-14 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126753#comment-16126753
 ] 

Dong Lin commented on KAFKA-5663:
-

[~cmccabe] I just looked into the issue. It seems that the error is not related 
the to log directory failure handling. The test failed at line 119 of 
log_dir_failure.py, which is before the test tries to log directory unavailable.

The test failed because consumer failed to start. According to 
ConsoleConsumer-0-139748592136016/worker7/console_consumer.log, the consumer 
failed to start due to "kafka.common.InvalidConfigException: Wrong value 
earliest of auto.offset.reset in ConsumerConfig; Valid values are smallest and 
largest". I looked into the python code and the log to understand why 
"auto.offset.reset" is configured to be "earliest". However, the code suggests 
that this should not happen.

This error should consistently cause the test to fail. I tried to verify this 
but https://jenkins.confluent.io/job/system-test-kafka-branch-builder is not 
working. I tried to test this locally but for some reason vagrant fails to 
work... I will try again tomorrow.

Can you tell me how to find out the git hash in the log you provided? Also, 
does this test fail consistently on your side?

Thanks,



> LogDirFailureTest system test fails
> ---
>
> Key: KAFKA-5663
> URL: https://issues.apache.org/jira/browse/KAFKA-5663
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Dong Lin
> Fix For: 1.0.0
>
>
> The recently added JBOD system test failed last night.
> {noformat}
> Producer failed to produce messages for 20s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/core/log_dir_failure_test.py",
>  line 166, in test_replication_with_disk_failure
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 75, in start_producer_and_consumer
> self.producer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Producer failed to produce messages for 20s.
> {noformat}
> Complete logs here:
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-07-26--001.1501074756--apache--trunk--91c207c/LogDirFailureTest/test_replication_with_disk_failure/bounce_broker=False.security_protocol=PLAINTEXT.broker_type=follower/48.tgz



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5606) Review consumer's RequestFuture usage pattern

2017-08-14 Thread james chien (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126750#comment-16126750
 ] 

james chien commented on KAFKA-5606:


[~ijuma] I open pull request 
here([https://github.com/apache/kafka/pull/3667/]), please review it :D

> Review consumer's RequestFuture usage pattern
> -
>
> Key: KAFKA-5606
> URL: https://issues.apache.org/jira/browse/KAFKA-5606
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: james chien
> Fix For: 1.0.0
>
>
> KAFKA-5556 shows that we can perhaps tighten the usage pattern of the 
> consumer's RequestFuture to avoid similar bugs in the future.
> Jason suggested:
> {quote}
> Another way to see this bug is a failure to ensure completion of the future. 
> Had we done so, then we could have skipped the failed check. This is why it 
> worked prior to the patch which added the timeout. The pattern should really 
> be something like this:
> {code}
> if (future.isDone()) {
>   if (future.succeeded()) {
> // handle success
>   } else {
> // handle failure
>   }
> }
> {code}
> I guess one benefit of the enum approach is that it forces you to ensure 
> completion prior to checking any of the possible results. That said, I'm a 
> bit more inclined to remove the isRetriable method and leave it to the caller 
> to determine what is and is not retriable. Then the request future only has 
> two completion states.
> {quote}
> An alternative is replacing succeeded and failed with a status method 
> returning an enum with 3 states: SUCCEEDED, FAILED, RETRY (the enum approach 
> mentioned above). This would make sense if we often have to handle these 3 
> states differently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5606) Review consumer's RequestFuture usage pattern

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126746#comment-16126746
 ] 

ASF GitHub Bot commented on KAFKA-5606:
---

GitHub user jedichien opened a pull request:

https://github.com/apache/kafka/pull/3667

[KAFKA-5606] Review consumer's RequestFuture usage pattern

Replacing succeeded, failed and retry with a status method returning an 
enum with 'SUCCEEDED', 'FAILED', 'RETRY' and 'NOT_RETRY'

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jedichien/kafka KAFKA-5606

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3667


commit da4bcbaa90ed2107128914f7d6ca92add6e616db
Author: jameschien 
Date:   2017-04-27T08:00:20Z

change connection refused message from level DEBUG into WARN.

commit 1adfd44424f2760def0a648c96129243088ef044
Author: jameschien 
Date:   2017-05-02T08:26:04Z

replace KeyValuePrinter and KStreamForeach with KStreamPeek

commit 407b48b83d3acd8e7803bd32e93aa28e48c446a9
Author: jameschien 
Date:   2017-05-02T10:45:23Z

replace KeyValuePrinter and KStreamForeach with KStreamPeek

commit 9c3caa19724606e2e8982ea1c63ce8a480945e73
Author: jameschien 
Date:   2017-05-02T10:51:01Z

revert

commit 80dc9358962c85bf7d369dea5fa5e41965071585
Author: JamesChien 
Date:   2017-05-03T01:20:56Z

remove unused variable

commit 0777963ffc452e69eb468e546c7e4881ecff4cde
Author: jameschien 
Date:   2017-05-05T09:01:46Z

replace 'KStreamPeek#KStreamPeekProcessor' with PrintForeachAction

commit 65eddb227ddc003e71e1e79a2d378014d652566c
Author: jameschien 
Date:   2017-05-05T09:03:59Z

revert and will open new issue to discuss whether to keep this or not

commit 08f37fb99e16c701f293d59d9d12a8f039e9613e
Author: jameschien 
Date:   2017-05-05T09:05:00Z

Merge branch 'trunk' of https://github.com/jedichien/kafka into trunk

replace 'KStreamPeek#KStreamPeekProcessor' with PrintForeachAction

commit 3371b13aec71d80e097212820b0beef625948f37
Author: jameschien 
Date:   2017-05-08T10:07:04Z

add final and remove unnecessary code

commit bfbcced3969701e525d6109372ff6cbedbd4
Author: jameschien 
Date:   2017-05-08T10:13:50Z

add final to local variable

commit 62972990a84c4a77c7ca8c3eba6028ddb3f73990
Author: jameschien 
Date:   2017-05-17T01:34:34Z

add KStreamPrint to print data message with PrintForeachAction

commit 57784600657c71ed4736704bf07af8060e79b74f
Author: jameschien 
Date:   2017-05-17T01:45:09Z

add KStreamPrint to print data message with PrintForeachAction

commit 83101ff29ffe52735883e8b71f61176388724229
Author: JamesChien 
Date:   2017-05-18T09:14:08Z

Augment KStream.print() to allow users pass in extra parameters in the 
printed string. I extend print() to KStream.print(KeyValueMapper), KStream.print(KeyValueMapper, String streamName), 
KStream.print(KeyValueMapper, Serde, Serde) and 
KStream.print(KeyValueMapper, Serde, Serde, String 
streamName).

commit c40db4fa6e0d0bc99e339f1f5749ee73e83a2ec8
Author: jedichien 
Date:   2017-05-23T02:17:51Z

fix conflict

commit e72a6c6fabd4c2139155a8927f7d2bc851914479
Author: JamesChien 
Date:   2017-05-23T02:28:21Z

Merge branch 'trunk' into KAFKA-4830

commit 3fe6a8ba513287755df7fc51a1d7b1e59aba0a23
Author: jedichien 
Date:   2017-05-23T02:29:53Z

remove whitespace

commit 28c8ccdb18757e7a832d3fe83e53b45e049bf6f9
Author: jedichien 
Date:   2017-05-23T02:38:19Z

I forget to remove this code when I solve confliction

commit 03a3f906d6eba9ded70af4fcd21638346822fc7b
Author: jedichien 
Date:   2017-05-23T02:41:14Z

I forget to remove this code when I solve confliction

commit b858b0ff52a4881a5f5146430122219f61bfefb3
Author: jedichien 
Date:   2017-06-23T04:21:17Z

[KIP-160] Augment KStream.print() and KStream.writeAsText() to allow users 
pass in extra parameters in the printed string.

commit 5245ed9e79e32138cda5f198b644a333e113715c
Author: jedichien 
Date:   2017-06-23T04:22:33Z

Merge remote-tracking branch 'origin/KAFKA-4830' into 

[jira] [Comment Edited] (KAFKA-5723) Refactor BrokerApiVersionsCommand to use the new AdminClient

2017-08-14 Thread Andrey Dyachkov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126258#comment-16126258
 ] 

Andrey Dyachkov edited comment on KAFKA-5723 at 8/14/17 7:18 PM:
-

[~ppatierno] do you mind if I take this class 
[https://github.com/apache/kafka/pull/3514/files#diff-10f8925e320906b2c621065e8df39811]
 ?


was (Author: andrey.dyach...@gmail.com):
[~ppatierno] do you mind if I take 
[https://github.com/apache/kafka/pull/3514/files#diff-10f8925e320906b2c621065e8df39811]

> Refactor BrokerApiVersionsCommand to use the new AdminClient
> 
>
> Key: KAFKA-5723
> URL: https://issues.apache.org/jira/browse/KAFKA-5723
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Viktor Somogyi
>Assignee: Viktor Somogyi
>
> Currently it uses the deprecated AdminClient and in order to remove usages of 
> that client, this class needs to be refactored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-5723) Refactor BrokerApiVersionsCommand to use the new AdminClient

2017-08-14 Thread Andrey Dyachkov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126258#comment-16126258
 ] 

Andrey Dyachkov edited comment on KAFKA-5723 at 8/14/17 7:18 PM:
-

[~ppatierno] do you mind if I take [this 
class|https://github.com/apache/kafka/pull/3514/files#diff-10f8925e320906b2c621065e8df39811]
 for the task?


was (Author: andrey.dyach...@gmail.com):
[~ppatierno] do you mind if I take this class 
[https://github.com/apache/kafka/pull/3514/files#diff-10f8925e320906b2c621065e8df39811]
 ?

> Refactor BrokerApiVersionsCommand to use the new AdminClient
> 
>
> Key: KAFKA-5723
> URL: https://issues.apache.org/jira/browse/KAFKA-5723
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Viktor Somogyi
>Assignee: Viktor Somogyi
>
> Currently it uses the deprecated AdminClient and in order to remove usages of 
> that client, this class needs to be refactored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5730) Consumer should ensure consistent ordering between sync and async offset commits

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126233#comment-16126233
 ] 

ASF GitHub Bot commented on KAFKA-5730:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3666

KAFKA-5730: Consumer should invoke async commit callback before sync commit 
returns



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-5730

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3666


commit 913b11db4a1651010270e4615d91937b5b65cd4c
Author: Jason Gustafson 
Date:   2017-08-14T18:56:47Z

KAFKA-5730: Consumer should invoke async commit callback before sync commit 
returns




> Consumer should ensure consistent ordering between sync and async offset 
> commits
> 
>
> Key: KAFKA-5730
> URL: https://issues.apache.org/jira/browse/KAFKA-5730
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
> Fix For: 1.0.0
>
>
> If {{commitSync}} is called with an async offset commit in-flight, it is 
> possible (even likely) for it to return prior to the invocation of the async 
> callback even though the OffsetCommit request for the latter will have 
> actually completed first. In this case, the async callback is not invoked 
> until the next call to {{poll}}, which can lead to very confusing behavior 
> since the client is led to think that the async commit happened second.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5567) With transformations that mutate the topic-partition committing offsets should to refer to the original topic-partition

2017-08-14 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-5567:

Priority: Blocker  (was: Major)

> With transformations that mutate the topic-partition committing offsets 
> should to refer to the original topic-partition
> ---
>
> Key: KAFKA-5567
> URL: https://issues.apache.org/jira/browse/KAFKA-5567
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.0, 0.10.2.1, 0.11.0.0
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Blocker
> Fix For: 0.11.0.1
>
>
>   When a chain of transformations (SMTs) that mutate a record's 
> topic-partition is applied, then Connect is unable to map the transformed 
> record to its original topic-partition. This affects committing offsets. 
>  Currently, in order to reproduce the issue one could use the 
> {{TimestampRouter}} transformation with a sink connector such as the 
> {{FileStreamSinkConnector}}.
>   In this ticket we'll address the issue for connectors that don't 
> manage/commit their offsets themselves. For the connectors that do such 
> management, broader API changes are required to supply the connectors with 
> the necessary information that will allow them to map a transformed record to 
> the original. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5713) Improve '--group' option to understand strings with wildcards

2017-08-14 Thread Alla Tumarkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alla Tumarkin updated KAFKA-5713:
-
Description: 
Request
Implement additional functionality for the '--group' option to be able to take 
string/wildcard combination, e.g.:
{code}
bin/kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add 
--allow-principal 
"User:CN=Unknown,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown" 
--consumer --topic test --group mygroup*
{code}
in order to allow different group names that start with mygroup, e.g.:
{code}
kafka-console-consumer --zookeeper localhost:2181 --topic test 
--consumer-property group.id=mygroup1
{code}

Background
Current functionality only permits to specify an exact group name, like 
"--group mygroup" or any group as in "--group *"

  was:
Request
Implement additional functionality for the '--group' option to be able to take 
string/wildcard combination, e.g.:
{code}
bin/kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add 
--allow-principal 
"User:CN=Unknown,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown" 
--consumer --topic test --group mygroup*
{code}
in order to allow different group names that start with mygroup, e.g.:
{code}
kafka-console-consumer --zookeeper localhost:2181 --topic test 
--consumer-property group.id=mygroup1
{code}

Background
Current functionality only permits to specify an exact group name, like 
"--group mygroup" or any group as in "group *"


> Improve '--group' option to understand strings with wildcards
> -
>
> Key: KAFKA-5713
> URL: https://issues.apache.org/jira/browse/KAFKA-5713
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.11.0.0
>Reporter: Alla Tumarkin
>
> Request
> Implement additional functionality for the '--group' option to be able to 
> take string/wildcard combination, e.g.:
> {code}
> bin/kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add 
> --allow-principal 
> "User:CN=Unknown,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown" 
> --consumer --topic test --group mygroup*
> {code}
> in order to allow different group names that start with mygroup, e.g.:
> {code}
> kafka-console-consumer --zookeeper localhost:2181 --topic test 
> --consumer-property group.id=mygroup1
> {code}
> Background
> Current functionality only permits to specify an exact group name, like 
> "--group mygroup" or any group as in "--group *"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4643) Improve test coverage of StreamsKafkaClient

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126029#comment-16126029
 ] 

ASF GitHub Bot commented on KAFKA-4643:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3663


> Improve test coverage of StreamsKafkaClient
> ---
>
> Key: KAFKA-4643
> URL: https://issues.apache.org/jira/browse/KAFKA-4643
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
> Fix For: 1.0.0
>
>
> Exception paths not tested.
> {{getTopicMetadata}} not tested



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4643) Improve test coverage of StreamsKafkaClient

2017-08-14 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy resolved KAFKA-4643.
---
   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 3663
[https://github.com/apache/kafka/pull/3663]

> Improve test coverage of StreamsKafkaClient
> ---
>
> Key: KAFKA-4643
> URL: https://issues.apache.org/jira/browse/KAFKA-4643
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
> Fix For: 1.0.0
>
>
> Exception paths not tested.
> {{getTopicMetadata}} not tested



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-658) Implement "Exact Mirroring" functionality in mirror maker

2017-08-14 Thread Paul Wilkinson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125847#comment-16125847
 ] 

Paul Wilkinson commented on KAFKA-658:
--

Does the recent work in exactly-once semantics mean this JIRA is likely to make 
progress?

> Implement "Exact Mirroring" functionality in mirror maker
> -
>
> Key: KAFKA-658
> URL: https://issues.apache.org/jira/browse/KAFKA-658
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Jay Kreps
>  Labels: project
>
> There are two ways to implement "mirroring" (i.e. replicating a topic from 
> one cluster to another):
> 1. Do a simple read from the source and write to the destination with no 
> attempt to maintain the same partitioning or offsets in the destination 
> cluster. In this case the destination cluster may have a different number of 
> partitions, and you can even read from many clusters to create a merged 
> cluster. This flexibility is nice. The downside is that since the 
> partitioning and offsets are not the same a consumer of the source cluster 
> has no equivalent position in the destination cluster. This is the style of 
> mirroring we have implemented in the mirror-maker tool and use for datacenter 
> replication today.
> 2. The second style of replication only would allow creating an exact replica 
> of a source cluster (i.e. all partitions and offsets exactly the same). The 
> nice thing about this is that the offsets and partitions would match exactly. 
> The downside is that it is not possible to merge multiple source clusters 
> this way or have different partitioning. We do not currently support this in 
> mirror maker.
> It would be nice to implement the second style as an option in mirror maker 
> as having an exact replica would be a nice option to have in the case where 
> you are replicating a single cluster only.
> There are some nuances: In order to maintain the exact offsets it is 
> important to guarantee that the producer never resends a message or loses a 
> message. As a result it would be important to have only a single producer for 
> each destination partition, and check the last produced message on startup 
> (using the getOffsets api) so that in the case of a hard crash messages that 
> are re-consumed are not re-emitted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-3984) Broker doesn't retry reconnecting to an expired Zookeeper connection

2017-08-14 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-3984.

Resolution: Duplicate

Marking this as duplicate. The fix will be done in KAFKA-5473.

> Broker doesn't retry reconnecting to an expired Zookeeper connection
> 
>
> Key: KAFKA-3984
> URL: https://issues.apache.org/jira/browse/KAFKA-3984
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.1.1
>Reporter: Braedon Vickers
>
> We've been having issues with the network connectivity of our Kafka cluster, 
> and this seems to be triggering an issue where the brokers stop trying to 
> reconnect to Zookeeper, leaving us with a broken cluster even when the 
> network has recovered.
> When network issues begin we see {{java.net.NoRouteToHostException}} 
> exceptions from {{org.apache.zookeeper.ClientCnxn}} as it attempts to 
> re-establish the connection. If the network issue resolves itself while we 
> are only getting these errors the broker seems to reconnect fine.
> However, a lot of the time we end up with a message like this:
> {code}[2016-07-22 00:21:44,181] FATAL Could not establish session with 
> zookeeper (kafka.server.KafkaHealthcheck)
> org.I0Itec.zkclient.exception.ZkException: Unable to connect to  hosts>
>   at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:71)
>   at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:1279)
> ...
> Caused by: java.net.UnknownHostException: 
>   at java.net.InetAddress.getAllByName(InetAddress.java:1126)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1192)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
>   at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
> ...
> {code}
> (apologies for the partial stack traces - I'm having to try and reconstruct 
> them from a less than ideal centralised logging setup.)
> If this happens, the broker stops trying to reconnect to Zookeeper, and we 
> have to restart it.
> It looks like while the {{org.apache.zookeeper.Zookeeper}} client's state 
> isn't {{Expired}} it will keep retrying the connection, and will recover OK 
> when the network is back. However, once it changes to {{Expired}} (not 
> entirely sure how that happens - based on the session timeout perhaps?) 
> zkclient closes the existing client and attempts to create a new one. If the 
> network is still down, the client constructor throws a 
> {{java.net.UnknownHostException}}, zkclient calls 
> {{handleSessionEstablishmentError()}} on {{KafkaHealthcheck}}, 
> {{KafkaHealthcheck.handleSessionEstablishmentError()}} logs a "Fatal" error 
> and does nothing else.
> It seems like some form of retry needs to happen here, or the broker is stuck 
> with no Zookeeper connection 
> indefinitely.{{KafkaHealthcheck.handleSessionEstablishmentError()}} used to 
> kill the JVM, but that was removed in 
> https://issues.apache.org/jira/browse/KAFKA-2405. Killing the JVM would be 
> better than doing nothing, as then your init system could restart it, 
> allowing it to recover once the network was back.
> Our cluster is running 0.9.0.1, so not sure if it affects 0.10.0.0 as well. 
> However, it seems likely, as there doesn't seem to be any code changes in 
> kafka or zkclient that would affect this behaviour.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

2017-08-14 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125827#comment-16125827
 ] 

Jun Rao commented on KAFKA-5473:


[~braedon], yes, this is the same issue as described in KAFKA-3984.

> handle ZK session expiration properly when a new session can't be established
> -
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Prasanna Gautam
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5733) System tests get exception RocksDBException: db has more levels than options.num_levels

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125778#comment-16125778
 ] 

ASF GitHub Bot commented on KAFKA-5733:
---

GitHub user bbejeck opened a pull request:

https://github.com/apache/kafka/pull/3664

KAFKA-5733: ensure clean RocksDB directory before setting prepareForB…

…ulkload settings

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bbejeck/kafka 
KAFKA-5733_rocks_db_throws_more_than_num_levels_on_restore

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3664.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3664


commit 2cfc54c8e77b840acc0d176cea8eabda4d1c4b03
Author: Bill Bejeck 
Date:   2017-08-14T14:46:08Z

KAFKA-5733: ensure clean RocksDB directory before setting 
prepareForBulkload settings




> System tests get exception RocksDBException: db has more levels than 
> options.num_levels
> ---
>
> Key: KAFKA-5733
> URL: https://issues.apache.org/jira/browse/KAFKA-5733
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.1
>Reporter: Eno Thereska
> Fix For: 0.11.0.1
>
>
> New system tests as part of KAFKA-5725 with PR 
> https://github.com/apache/kafka/pull/3656 consistently fail like this:
> [2017-08-14 10:37:57,216] ERROR User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group SmokeTest failed on partition assignment 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
> cntStoreName at location /mnt/streams/SmokeTest/2_0/rocksdb/cntStoreName
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:206)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:176)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.toggleDbForBulkLoading(RocksDBStore.java:259)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.access$200(RocksDBStore.java:74)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.onRestoreStart(RocksDBStore.java:551)
>   at 
> org.apache.kafka.streams.processor.internals.CompositeRestoreListener.onRestoreStart(CompositeRestoreListener.java:54)
>   at 
> org.apache.kafka.streams.processor.internals.StateRestorer.restoreStarted(StateRestorer.java:61)
>   at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:126)
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:94)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:177)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:368)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:317)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:736)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:707)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:677)
> Caused by: org.rocksdb.RocksDBException: db has more levels than 
> options.num_levels
>   at org.rocksdb.RocksDB.open(Native Method)
>   at org.rocksdb.RocksDB.open(RocksDB.java:231)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:199)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5733) System tests get exception RocksDBException: db has more levels than options.num_levels

2017-08-14 Thread Bill Bejeck (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125708#comment-16125708
 ] 

Bill Bejeck commented on KAFKA-5733:


I've been able to reproduce this in a unit-test.  

Here is the issue.  We have an existing RocksDB store (default num levels 
setting of 7). When the broker goes down and reconnects a re-balance is 
triggered and causes a state restoration.  When we go to restore the state 
store, we close RocksDB,  we turn on {{options.prepareForBulkload}} which sets 
the num_levels to 2 for optimized bulk loads, then re-opens the database. 

Then we get the error because the pre-existing data is already there with 
num_levels of 7, thus throwing the exception.  

> System tests get exception RocksDBException: db has more levels than 
> options.num_levels
> ---
>
> Key: KAFKA-5733
> URL: https://issues.apache.org/jira/browse/KAFKA-5733
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.1
>Reporter: Eno Thereska
> Fix For: 0.11.0.1
>
>
> New system tests as part of KAFKA-5725 with PR 
> https://github.com/apache/kafka/pull/3656 consistently fail like this:
> [2017-08-14 10:37:57,216] ERROR User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group SmokeTest failed on partition assignment 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
> cntStoreName at location /mnt/streams/SmokeTest/2_0/rocksdb/cntStoreName
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:206)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:176)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.toggleDbForBulkLoading(RocksDBStore.java:259)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.access$200(RocksDBStore.java:74)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.onRestoreStart(RocksDBStore.java:551)
>   at 
> org.apache.kafka.streams.processor.internals.CompositeRestoreListener.onRestoreStart(CompositeRestoreListener.java:54)
>   at 
> org.apache.kafka.streams.processor.internals.StateRestorer.restoreStarted(StateRestorer.java:61)
>   at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:126)
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:94)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:177)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:368)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:317)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:736)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:707)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:677)
> Caused by: org.rocksdb.RocksDBException: db has more levels than 
> options.num_levels
>   at org.rocksdb.RocksDB.open(Native Method)
>   at org.rocksdb.RocksDB.open(RocksDB.java:231)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:199)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5733) System tests get exception RocksDBException: db has more levels than options.num_levels

2017-08-14 Thread Eno Thereska (JIRA)
Eno Thereska created KAFKA-5733:
---

 Summary: System tests get exception RocksDBException: db has more 
levels than options.num_levels
 Key: KAFKA-5733
 URL: https://issues.apache.org/jira/browse/KAFKA-5733
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.11.0.1
Reporter: Eno Thereska
 Fix For: 0.11.0.1


New system tests as part of KAFKA-5725 with PR 
https://github.com/apache/kafka/pull/3656 consistently fail like this:
[2017-08-14 10:37:57,216] ERROR User provided listener 
org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener for 
group SmokeTest failed on partition assignment 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
cntStoreName at location /mnt/streams/SmokeTest/2_0/rocksdb/cntStoreName
at 
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:206)
at 
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:176)
at 
org.apache.kafka.streams.state.internals.RocksDBStore.toggleDbForBulkLoading(RocksDBStore.java:259)
at 
org.apache.kafka.streams.state.internals.RocksDBStore.access$200(RocksDBStore.java:74)
at 
org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.onRestoreStart(RocksDBStore.java:551)
at 
org.apache.kafka.streams.processor.internals.CompositeRestoreListener.onRestoreStart(CompositeRestoreListener.java:54)
at 
org.apache.kafka.streams.processor.internals.StateRestorer.restoreStarted(StateRestorer.java:61)
at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:126)
at 
org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:94)
at 
org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:177)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:368)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:317)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
at 
org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:736)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:707)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:677)
Caused by: org.rocksdb.RocksDBException: db has more levels than 
options.num_levels
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:231)
at 
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:199)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4643) Improve test coverage of StreamsKafkaClient

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125472#comment-16125472
 ] 

ASF GitHub Bot commented on KAFKA-4643:
---

Github user adyach closed the pull request at:

https://github.com/apache/kafka/pull/3659


> Improve test coverage of StreamsKafkaClient
> ---
>
> Key: KAFKA-4643
> URL: https://issues.apache.org/jira/browse/KAFKA-4643
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>
> Exception paths not tested.
> {{getTopicMetadata}} not tested



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5673) Refactor KeyValueStore hierarchy so that MeteredKeyValueStore is the outermost store

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125430#comment-16125430
 ] 

ASF GitHub Bot commented on KAFKA-5673:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3592


> Refactor KeyValueStore hierarchy so that MeteredKeyValueStore is the 
> outermost store
> 
>
> Key: KAFKA-5673
> URL: https://issues.apache.org/jira/browse/KAFKA-5673
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 1.0.0
>
>
> MeteredKeyValueStore is currently not the outermost store. Further it needs 
> to have the inner store as {{}} to allow easy plugability of 
> custom storage engines. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5072) Kafka topics should allow custom metadata configs within some config namespace

2017-08-14 Thread Ondrej Tomcik (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125406#comment-16125406
 ] 

Ondrej Tomcik commented on KAFKA-5072:
--

Hello [~soumabrata], how it looks like with this improvement? I have to use 
some custom metadata for topic and there is no solution in Kafka for this, or I 
am at least not aware of any.

Thanks

> Kafka topics should allow custom metadata configs within some config namespace
> --
>
> Key: KAFKA-5072
> URL: https://issues.apache.org/jira/browse/KAFKA-5072
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Affects Versions: 0.10.2.0
>Reporter: Soumabrata Chakraborty
>Assignee: Soumabrata Chakraborty
>Priority: Minor
>
> Kafka topics should allow custom metadata configs
> Such config properties may have some fixed namespace e.g. metadata* or custom*
> This is handy for governance.  For example, in large organizations sharing a 
> kafka cluster - it might be helpful to be able to configure properties like 
> metadata.contact.info, metadata.project, metadata.description on a topic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)