[jira] [Created] (KAFKA-6062) Reduce topic partition count in kafka version 0.10.0.0

2017-10-13 Thread Balu (JIRA)
Balu created KAFKA-6062:
---

 Summary: Reduce topic partition count in kafka version 0.10.0.0
 Key: KAFKA-6062
 URL: https://issues.apache.org/jira/browse/KAFKA-6062
 Project: Kafka
  Issue Type: Task
Reporter: Balu


Using  kafka,zookeeper,schema repository cluster. Current partition count is 10 
and have to make it 3. Can we do it?

Appreciate steps if dataloss is fine.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-10-13 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204367#comment-16204367
 ] 

Jun Rao commented on KAFKA-2729:


[~fravigotti], sorry to hear that. A couple of quick suggestions.

(1) Do you see any ZK session expiration in the log (e.g., INFO zookeeper state 
changed (Expired) (org.I0Itec.zkclient.ZkClient))? There are known bugs in 
Kafka in handling ZK session expiration. So, it would be useful to avoid it in 
the first place. Typical causes of ZK session expiration are long GC in the 
broker or network glitches. So you can either tune the broker or increase 
zookeeper.session.timeout.ms.

(2) Do you have lots of partitions (say a few thousands) per broker? If so, you 
want to check if the controlled shutdown succeeds when shutting down a broker. 
If not, restarting the broker too soon could also lead the cluster to a weird 
state. To address this issue, you can increase request.timeout.ms on the broker.

We are fixing the known issue in (1) and improving the performance with lots of 
partitions in (2) in KAFKA-5642 and we expect the fix to be included in the 
1.1.0 release in Feb.

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1, 0.9.0.0, 0.10.0.0, 0.10.1.0, 0.11.0.0
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6061) "ERROR Error while electing or becoming leader on broker 13 (kafka.server.ZookeeperLeaderElector) kafka.common.KafkaException: Can't parse json string: null" should prin

2017-10-13 Thread Koelli Mungee (JIRA)
Koelli Mungee created KAFKA-6061:


 Summary: "ERROR Error while electing or becoming leader on broker 
13 (kafka.server.ZookeeperLeaderElector)  kafka.common.KafkaException: Can't 
parse json string: null" should print out information on which zookeeper path 
contains the null element
 Key: KAFKA-6061
 URL: https://issues.apache.org/jira/browse/KAFKA-6061
 Project: Kafka
  Issue Type: Bug
  Components: zkclient
Affects Versions: 0.10.2.1
Reporter: Koelli Mungee


The controller enters a loop with the error as 


{code:java}
[2017-10-12 21:40:09,532] ERROR Error while electing or becoming leader on 
broker 13 (kafka.server.ZookeeperLeaderElector) 
kafka.common.KafkaException: Can't parse json string: null 
at kafka.utils.Json$.liftedTree1$1(Json.scala:40) 
at kafka.utils.Json$.parseFull(Json.scala:36) 
at 
kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$1.apply(ZkUtils.scala:684)
 
at 
kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$1.apply(ZkUtils.scala:680)
 
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) 
at kafka.utils.ZkUtils.getReplicaAssignmentForTopics(ZkUtils.scala:680) 
at 
kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:736)
 
at 
kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:334)
 
at 
kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:167)
 
at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84) 
{code}

A kafka-topics --describe can be issued to figure out which topic partition has 
the problem. However, this would be easier for the user if the actual zk path 
with the null or malformed entry would be printed out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4818) Implement transactional clients

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204222#comment-16204222
 ] 

ASF GitHub Bot commented on KAFKA-4818:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4038


> Implement transactional clients
> ---
>
> Key: KAFKA-4818
> URL: https://issues.apache.org/jira/browse/KAFKA-4818
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Jason Gustafson
>Assignee: Apurva Mehta
> Fix For: 0.11.0.0
>
>
> This covers the implementation of the producer and consumer to support 
> transactions, as described in KIP-98: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6060) Add workload generation capabilities to Trogdor

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204195#comment-16204195
 ] 

ASF GitHub Bot commented on KAFKA-6060:
---

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/4073

KAFKA-6060: Add workload generation capabilities to Trogdor

Previously, Trogdor only handled "Faults."  Now, Trogdor can handle
"Tasks" which may be either faults, or workloads to execute in the
background.

The Agent and Coordinator have been refactored from a
mutexes-and-condition-variables paradigm into a message passing
paradigm.  No locks are necessary, because only one thread can access
the task state or worker state.  This makes them a lot easier to reason
about.

The MockTime class can now handle mocking deferred message passing
(adding a message to an ExecutorService with a delay).  I added a
MockTimeTest.

MiniTrogdorCluster now starts up Agent and Coordinator classes in
paralle in order to minimize junit test time.

RPC messages now inherit from a common Message.java class.  This class
handles implementing serialization, equals, hashCode, etc.

Remove FaultSet, since it is no longer necessary.

Previously, if CoordinatorClient or AgentClient hit a networking
problem, they would throw an exception.  They now retry several times
before giving up.  Additionally, the REST RPCs to the Coordinator and
Agent have been changed to be idempotent.  If a response is lost, and
the request is resent, no harm will be done.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-6060

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4073.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4073


commit 936427509fec91cf33c798b8a7208dd6c8dc8ba5
Author: Colin P. Mccabe 
Date:   2017-09-26T01:30:13Z

KAFKA-6060: Add workload generation capabilities to Trogdor

Previously, Trogdor only handled "Faults."  Now, Trogdor can handle
"Tasks" which may be either faults, or workloads to execute in the
background.

The Agent and Coordinator have been refactored from a
mutexes-and-condition-variables paradigm into a message passing
paradigm.  No locks are necessary, because only one thread can access
the task state or worker state.  This makes them a lot easier to reason
about.

The MockTime class can now handle mocking deferred message passing
(adding a message to an ExecutorService with a delay).  I added a
MockTimeTest.

MiniTrogdorCluster now starts up Agent and Coordinator classes in
paralle in order to minimize junit test time.

RPC messages now inherit from a common Message.java class.  This class
handles implementing serialization, equals, hashCode, etc.

Remove FaultSet, since it is no longer necessary.

Previously, if CoordinatorClient or AgentClient hit a networking
problem, they would throw an exception.  They now retry several times
before giving up.  Additionally, the REST RPCs to the Coordinator and
Agent have been changed to be idempotent.  If a response is lost, and
the request is resent, no harm will be done.




> Add workload generation capabilities to Trogdor
> ---
>
> Key: KAFKA-6060
> URL: https://issues.apache.org/jira/browse/KAFKA-6060
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> Add workload generation capabilities to Trogdor



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6060) Add workload generation capabilities to Trogdor

2017-10-13 Thread Colin P. McCabe (JIRA)
Colin P. McCabe created KAFKA-6060:
--

 Summary: Add workload generation capabilities to Trogdor
 Key: KAFKA-6060
 URL: https://issues.apache.org/jira/browse/KAFKA-6060
 Project: Kafka
  Issue Type: Sub-task
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


Add workload generation capabilities to Trogdor



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-6058) Add "describe consumer group" to KafkaAdminClient

2017-10-13 Thread Jorge Quilcate (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Quilcate reassigned KAFKA-6058:
-

Assignee: Jorge Quilcate

> Add "describe consumer group" to KafkaAdminClient
> -
>
> Key: KAFKA-6058
> URL: https://issues.apache.org/jira/browse/KAFKA-6058
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Matthias J. Sax
>Assignee: Jorge Quilcate
>  Labels: needs-kip
>
> {{KafkaAdminClient}} does not allow to get information about consumer groups. 
> This feature is supported by old {{kafka.admin.AdminClient}} though.
> We should add {{KafkaAdminClient#describeConsumerGroup()}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5663) LogDirFailureTest system test fails

2017-10-13 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204151#comment-16204151
 ] 

Jun Rao commented on KAFKA-5663:


Not sure. The test fails with the following error.

07:56:28 Broker 2 should not be leader of topic test_topic_1 and partition 0
07:56:28 Traceback (most recent call last):
07:56:28   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.1-py2.7.egg/ducktape/tests/runner_client.py",
 line 132, in run
07:56:28 data = self.run_test()
07:56:28   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.1-py2.7.egg/ducktape/tests/runner_client.py",
 line 185, in run_test
07:56:28 return self.test_context.function(self.test)
07:56:28   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.1-py2.7.egg/ducktape/mark/_mark.py",
 line 324, in wrapper
07:56:28 return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
07:56:28   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/log_dir_failure_test.py",
 line 142, in test_replication_with_disk_failure
07:56:28 err_msg="Broker %d should not be leader of topic %s and partition 
0" % (broker_idx, self.topic1))
07:56:28   File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.1-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
07:56:28 raise TimeoutError(err_msg)

> LogDirFailureTest system test fails
> ---
>
> Key: KAFKA-5663
> URL: https://issues.apache.org/jira/browse/KAFKA-5663
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Dong Lin
> Fix For: 1.0.0
>
>
> The recently added JBOD system test failed last night.
> {noformat}
> Producer failed to produce messages for 20s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/core/log_dir_failure_test.py",
>  line 166, in test_replication_with_disk_failure
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 75, in start_producer_and_consumer
> self.producer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Producer failed to produce messages for 20s.
> {noformat}
> Complete logs here:
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-07-26--001.1501074756--apache--trunk--91c207c/LogDirFailureTest/test_replication_with_disk_failure/bounce_broker=False.security_protocol=PLAINTEXT.broker_type=follower/48.tgz



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6052) Windows: Consumers not polling when isolation.level=read_committed

2017-10-13 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204093#comment-16204093
 ] 

Apurva Mehta commented on KAFKA-6052:
-

Hi Ansel, 

the logs you shared don't correlate with each other. For instance, your 
`Producer_consumer.log` map transactionalId "TID1" to producerId 2000, as can 
be seen in these lines: 

{noformat}
2017-10-13 11:45:50 DEBUG Sender:347 - [TransactionalId TID1] Sending 
transactional request (type=InitProducerIdRequest, transactionalId=TID1, 
transactionTimeoutMs=5000) to node 192.168.56.1:9093 (id: 1 rack: null)
2017-10-13 11:45:50 TRACE NetworkClient:389 - Sending INIT_PRODUCER_ID 
{transactional_id=TID1,transaction_timeout_ms=5000} to node 1.
2017-10-13 11:45:50 TRACE NetworkClient:658 - Completed receive from node 1, 
for key 22, received 
{throttle_time_ms=0,error_code=0,producer_id=2000,producer_epoch=0}
2017-10-13 11:45:50 TRACE TransactionManager:645 - [TransactionalId TID1] 
Received transactional response InitProducerIdResponse(error=NONE, 
producerId=2000, producerEpoch=0, throttleTimeMs=0) for request 
(type=InitProducerIdRequest, transactionalId=TID1, transactionTimeoutMs=5000)
2017-10-13 11:45:50 INFO  TransactionManager:341 - [TransactionalId TID1] 
ProducerId set to 2000 with epoch 0
2017-10-13 11:45:50 DEBUG TransactionManager:512 - [TransactionalId TID1] 
Transition from state INITIALIZING to READY
2017-10-13 11:45:50 DEBUG TransactionManager:512 - [TransactionalId TID1] 
Transition from state READY to IN_TRANSACTION
{noformat}

However, the transaction log maps transactionalId TID1 to producerId 1000:

{noformat}
offset: 0 position: 0 CreateTime: 1507893603185 isvalid: true keysize: 8 
valuesize: 37 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: transactionalId=TID1 payload: 
producerId:1000,producerEpoch:0,state=Empty,partitions=Set(),txnLastUpdateTimestamp=1507893603185,txnTimeoutMs=5000
offset: 1 position: 113 CreateTime: 1507893603242 isvalid: true keysize: 8 
valuesize: 57 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: transactionalId=TID1 payload: 
producerId:1000,producerEpoch:0,state=Ongoing,partitions=Set(Topic_AVRO-0),txnLastUpdateTimestamp=1507893603240,txnTimeoutMs=5000
offset: 2 position: 247 CreateTime: 1507893603317 isvalid: true keysize: 8 
valuesize: 57 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: transactionalId=TID1 payload: 
producerId:1000,producerEpoch:0,state=PrepareCommit,partitions=Set(Topic_AVRO-0),txnLastUpdateTimestamp=1507893603315,txnTimeoutMs=5000
o
{noformat}

It is definitely curious that the transaction state in the transaction log is 
left at `PrepareCommit`. This would explain why read committed consumers don't 
return any data.

However, your client logs don't match the server log, so they can't be used for 
debugging. Also, in the client logs you sent `TID1` successfully commits 
transaction by getting the proper response for the broker. If the client 
received the successful commit response, it is highly unlikely for the state 
not to change on the broker, especially since other state changes happened on 
the broker.

Finally, it would be helpful to have each broker's server.log, and the 
producer, and the consumer log in separate files.

thanks,
Apurva



> Windows: Consumers not polling when isolation.level=read_committed 
> ---
>
> Key: KAFKA-6052
> URL: https://issues.apache.org/jira/browse/KAFKA-6052
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer 
>Affects Versions: 0.11.0.0
> Environment: Windows 10. All processes running in embedded mode.
>Reporter: Ansel Zandegran
> Attachments: Prducer_Consumer.log, kafka-logs.zip, logFile.log
>
>
> *The same code is running fine in Linux.* I am trying to send a transactional 
> record with exactly once schematics. These are my producer, consumer and 
> broker setups. 
> public void sendWithTTemp(String topic, EHEvent event) {
> Properties props = new Properties();
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>   "localhost:9092,localhost:9093,localhost:9094");
> //props.put("bootstrap.servers", 
> "34.240.248.190:9092,52.50.95.30:9092,52.50.95.30:9092");
> props.put(ProducerConfig.ACKS_CONFIG, "all");
> props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
> props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1");
> props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
> props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
> props.put("transactional.id", "TID" + transactionId.incrementAndGet());
> props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "50

[jira] [Commented] (KAFKA-5663) LogDirFailureTest system test fails

2017-10-13 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204058#comment-16204058
 ] 

Ismael Juma commented on KAFKA-5663:


[~junrao] Is Onur's PR rebased against the latest trunk?

> LogDirFailureTest system test fails
> ---
>
> Key: KAFKA-5663
> URL: https://issues.apache.org/jira/browse/KAFKA-5663
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Dong Lin
> Fix For: 1.0.0
>
>
> The recently added JBOD system test failed last night.
> {noformat}
> Producer failed to produce messages for 20s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/core/log_dir_failure_test.py",
>  line 166, in test_replication_with_disk_failure
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 75, in start_producer_and_consumer
> self.producer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Producer failed to produce messages for 20s.
> {noformat}
> Complete logs here:
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-07-26--001.1501074756--apache--trunk--91c207c/LogDirFailureTest/test_replication_with_disk_failure/bounce_broker=False.security_protocol=PLAINTEXT.broker_type=follower/48.tgz



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5663) LogDirFailureTest system test fails

2017-10-13 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203991#comment-16203991
 ] 

Jun Rao commented on KAFKA-5663:


[~lindong], I was running the system test for the controller improvement PR 
from Onur and this is the only test that fails 
(http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-10-12--001.1507881382--onurkaraman--KAFKA-5642--9b4fc57/report.html).
 It seems that Onur's patch is orthogonal to this test. Could you take a look 
and see what the issue might be?

> LogDirFailureTest system test fails
> ---
>
> Key: KAFKA-5663
> URL: https://issues.apache.org/jira/browse/KAFKA-5663
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Dong Lin
> Fix For: 1.0.0
>
>
> The recently added JBOD system test failed last night.
> {noformat}
> Producer failed to produce messages for 20s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/core/log_dir_failure_test.py",
>  line 166, in test_replication_with_disk_failure
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 75, in start_producer_and_consumer
> self.producer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Producer failed to produce messages for 20s.
> {noformat}
> Complete logs here:
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-07-26--001.1501074756--apache--trunk--91c207c/LogDirFailureTest/test_replication_with_disk_failure/bounce_broker=False.security_protocol=PLAINTEXT.broker_type=follower/48.tgz



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-10-13 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203772#comment-16203772
 ] 

Ismael Juma commented on KAFKA-2729:


If you're seeing the issue this often, then there's most likely a configuration 
issue. If you file a separate issue with all the logs (including GC logs) and 
configs (broker and ZK), maybe someone can help.

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1, 0.9.0.0, 0.10.0.0, 0.10.1.0, 0.11.0.0
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-10-13 Thread Francesco vigotti (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203749#comment-16203749
 ] 

Francesco vigotti commented on KAFKA-2729:
--

After having lost 2 days on this I've reset whole cluster, stopped all kafka 
brokers, stopped zookeeper cluster, delete all directories,stopped all consumer 
and producer ,then restarted everything , recreated topics and now guess what? 
:)

one node reports... 
{code:java}

[2017-10-13 15:54:52,893] INFO Partition [__consumer_offsets,5] on broker 2: 
Expanding ISR for partition __consumer_offsets-5 from 10,13,2 to 10,13,2,5 
(kafka.cluster.Partition)
[2017-10-13 15:54:52,906] INFO Partition [__consumer_offsets,5] on broker 2: 
Cached zkVersion [13] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-10-13 15:54:52,908] INFO Partition [__consumer_offsets,25] on broker 2: 
Expanding ISR for partition __consumer_offsets-25 from 10,2,13 to 10,2,13,5 
(kafka.cluster.Partition)
[2017-10-13 15:54:52,915] INFO Partition [__consumer_offsets,25] on broker 2: 
Cached zkVersion [10] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-10-13 15:54:52,916] INFO Partition [__consumer_offsets,45] on broker 2: 
Expanding ISR for partition __consumer_offsets-45 from 10,13,2 to 10,13,2,5 
(kafka.cluster.Partition)
[2017-10-13 15:54:52,925] INFO Partition [__consumer_offsets,45] on broker 2: 
Cached zkVersion [15] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-10-13 15:54:52,926] INFO Partition [__consumer_offsets,5] on broker 2: 
Expanding ISR for partition __consumer_offsets-5 from 10,13,2 to 10,13,2,5 
(kafka.cluster.Partition)
[2017-10-13 15:54:52,936] INFO Partition [__consumer_offsets,5] on broker 2: 
Cached zkVersion [13] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-10-13 15:54:52,939] INFO Partition [__consumer_offsets,25] on broker 2: 
Expanding ISR for partition __consumer_offsets-25 from 10,2,13 to 10,2,13,5 
(kafka.cluster.Partition)
{code}

while others 


{code:java}
[2017-10-13 15:57:08,128] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[__consumer_offsets,40] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-10-13 15:57:09,129] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[__consumer_offsets,40] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-10-13 15:57:10,260] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[__consumer_offsets,40] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-10-13 15:57:11,262] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[__consumer_offsets,40] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-10-13 15:57:12,265] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[__consumer_offsets,40] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-10-13 15:57:13,289] ERROR [ReplicaFetcherThread-0-2], Error fo
{code}


cluster still being inconsistent, I've also added 2 more nodes hoping in an 
increasing of stability but nothing, I don't know if something is wrong because 
if kafka do some kind of pre-flight checks during startup it does log nothing.. 
the only logs are those which have no sense because the leader should be 
re-elected when there are ISR available.. and there are 
I've started looking for an alternative software to  use, I'm trying to use 
kafka is so frustrating :(


> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1, 0.9.0.0, 0.10.0.0, 0.10.1.0, 0.11.0.0
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> part

[jira] [Comment Edited] (KAFKA-5846) Use singleton NoOpConsumerRebalanceListener in subscribe() call where listener is not specified

2017-10-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191405#comment-16191405
 ] 

Ted Yu edited comment on KAFKA-5846 at 10/13/17 3:53 PM:
-

Patch looks good.


was (Author: yuzhih...@gmail.com):
lgtm

> Use singleton NoOpConsumerRebalanceListener in subscribe() call where 
> listener is not specified
> ---
>
> Key: KAFKA-5846
> URL: https://issues.apache.org/jira/browse/KAFKA-5846
> Project: Kafka
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Kamal Chandraprakash
>Priority: Minor
>
> Currently KafkaConsumer creates instance of NoOpConsumerRebalanceListener for 
> each subscribe() call where ConsumerRebalanceListener is not specified:
> {code}
> public void subscribe(Pattern pattern) {
> subscribe(pattern, new NoOpConsumerRebalanceListener());
> {code}
> We can create a singleton NoOpConsumerRebalanceListener to be used in such 
> scenarios.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4575) Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after resuming sink connector

2017-10-13 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-4575.
--
Resolution: Fixed

This appears to no longer be an issue, so I'm closing this as fixed.

> Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in 
> consuming messages after resuming sink connector
> --
>
> Key: KAFKA-4575
> URL: https://issues.apache.org/jira/browse/KAFKA-4575
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, system tests
>Reporter: Shikhar Bhushan
>Assignee: Konstantine Karantasis
> Fix For: 0.10.2.0
>
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html
> {noformat}
> [INFO  - 2016-12-29 08:56:23,050 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink:
>  Summary: Failed to consume messages after resuming source connector
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 267, in test_pause_and_resume_sink
> err_msg="Failed to consume messages after resuming source connector")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Failed to consume messages after resuming source connector
> {noformat}
> We recently fixed KAFKA-4527 and this is a new kind of failure in the same 
> test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-6052) Windows: Consumers not polling when isolation.level=read_committed

2017-10-13 Thread Ansel Zandegran (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203299#comment-16203299
 ] 

Ansel Zandegran edited comment on KAFKA-6052 at 10/13/17 11:22 AM:
---

Yes, the entire setup is running in windows not just the clients. I have 
attached the new TRACE level log. The log includes producer and consumer logs 
along with 3 brokers and zookeeper host. I have also added the TRACE level log 
for just the producer and consumer. Data log files are also attached. Thanks. 


was (Author: zandegran):
Yes, the entire setup is running in windows not just the clients. I have 
attached the new TRACE level log. The log includes producer and consumer logs 
along with 3 brokers and zookeeper host. I have also added the TRACE level log 
for just the producer and consumer. Thanks. 

> Windows: Consumers not polling when isolation.level=read_committed 
> ---
>
> Key: KAFKA-6052
> URL: https://issues.apache.org/jira/browse/KAFKA-6052
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer 
>Affects Versions: 0.11.0.0
> Environment: Windows 10. All processes running in embedded mode.
>Reporter: Ansel Zandegran
> Attachments: Prducer_Consumer.log, kafka-logs.zip, logFile.log
>
>
> *The same code is running fine in Linux.* I am trying to send a transactional 
> record with exactly once schematics. These are my producer, consumer and 
> broker setups. 
> public void sendWithTTemp(String topic, EHEvent event) {
> Properties props = new Properties();
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>   "localhost:9092,localhost:9093,localhost:9094");
> //props.put("bootstrap.servers", 
> "34.240.248.190:9092,52.50.95.30:9092,52.50.95.30:9092");
> props.put(ProducerConfig.ACKS_CONFIG, "all");
> props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
> props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1");
> props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
> props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
> props.put("transactional.id", "TID" + transactionId.incrementAndGet());
> props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "5000");
> Producer producer =
> new KafkaProducer<>(props,
> new StringSerializer(),
> new StringSerializer());
> Logger.log(this, "Initializing transaction...");
> producer.initTransactions();
> Logger.log(this, "Initializing done.");
> try {
>   Logger.log(this, "Begin transaction...");
>   producer.beginTransaction();
>   Logger.log(this, "Begin transaction done.");
>   Logger.log(this, "Sending events...");
>   producer.send(new ProducerRecord<>(topic,
>  event.getKey().toString(),
>  event.getValue().toString()));
>   Logger.log(this, "Sending events done.");
>   Logger.log(this, "Committing...");
>   producer.commitTransaction();
>   Logger.log(this, "Committing done.");
> } catch (ProducerFencedException | OutOfOrderSequenceException
> | AuthorizationException e) {
>   producer.close();
>   e.printStackTrace();
> } catch (KafkaException e) {
>   producer.abortTransaction();
>   e.printStackTrace();
> }
> producer.close();
>   }
> *In Consumer*
> I have set isolation.level=read_committed
> *In 3 Brokers*
> I'm running with the following properties
>   Properties props = new Properties();
>   props.setProperty("broker.id", "" + i);
>   props.setProperty("listeners", "PLAINTEXT://:909" + (2 + i));
>   props.setProperty("log.dirs", Configuration.KAFKA_DATA_PATH + "\\B" + 
> i);
>   props.setProperty("num.partitions", "1");
>   props.setProperty("zookeeper.connect", "localhost:2181");
>   props.setProperty("zookeeper.connection.timeout.ms", "6000");
>   props.setProperty("min.insync.replicas", "2");
>   props.setProperty("offsets.topic.replication.factor", "2");
>   props.setProperty("offsets.topic.num.partitions", "1");
>   props.setProperty("transaction.state.log.num.partitions", "2");
>   props.setProperty("transaction.state.log.replication.factor", "2");
>   props.setProperty("transaction.state.log.min.isr", "2");
> I am not getting any records in the consumer. When I set 
> isolation.level=read_uncommitted, I get the records. I assume that the 
> records are not getting commited. What could be the problem? log attached



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6052) Windows: Consumers not polling when isolation.level=read_committed

2017-10-13 Thread Ansel Zandegran (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ansel Zandegran updated KAFKA-6052:
---
Attachment: kafka-logs.zip

> Windows: Consumers not polling when isolation.level=read_committed 
> ---
>
> Key: KAFKA-6052
> URL: https://issues.apache.org/jira/browse/KAFKA-6052
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer 
>Affects Versions: 0.11.0.0
> Environment: Windows 10. All processes running in embedded mode.
>Reporter: Ansel Zandegran
> Attachments: Prducer_Consumer.log, kafka-logs.zip, logFile.log
>
>
> *The same code is running fine in Linux.* I am trying to send a transactional 
> record with exactly once schematics. These are my producer, consumer and 
> broker setups. 
> public void sendWithTTemp(String topic, EHEvent event) {
> Properties props = new Properties();
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>   "localhost:9092,localhost:9093,localhost:9094");
> //props.put("bootstrap.servers", 
> "34.240.248.190:9092,52.50.95.30:9092,52.50.95.30:9092");
> props.put(ProducerConfig.ACKS_CONFIG, "all");
> props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
> props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1");
> props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
> props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
> props.put("transactional.id", "TID" + transactionId.incrementAndGet());
> props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "5000");
> Producer producer =
> new KafkaProducer<>(props,
> new StringSerializer(),
> new StringSerializer());
> Logger.log(this, "Initializing transaction...");
> producer.initTransactions();
> Logger.log(this, "Initializing done.");
> try {
>   Logger.log(this, "Begin transaction...");
>   producer.beginTransaction();
>   Logger.log(this, "Begin transaction done.");
>   Logger.log(this, "Sending events...");
>   producer.send(new ProducerRecord<>(topic,
>  event.getKey().toString(),
>  event.getValue().toString()));
>   Logger.log(this, "Sending events done.");
>   Logger.log(this, "Committing...");
>   producer.commitTransaction();
>   Logger.log(this, "Committing done.");
> } catch (ProducerFencedException | OutOfOrderSequenceException
> | AuthorizationException e) {
>   producer.close();
>   e.printStackTrace();
> } catch (KafkaException e) {
>   producer.abortTransaction();
>   e.printStackTrace();
> }
> producer.close();
>   }
> *In Consumer*
> I have set isolation.level=read_committed
> *In 3 Brokers*
> I'm running with the following properties
>   Properties props = new Properties();
>   props.setProperty("broker.id", "" + i);
>   props.setProperty("listeners", "PLAINTEXT://:909" + (2 + i));
>   props.setProperty("log.dirs", Configuration.KAFKA_DATA_PATH + "\\B" + 
> i);
>   props.setProperty("num.partitions", "1");
>   props.setProperty("zookeeper.connect", "localhost:2181");
>   props.setProperty("zookeeper.connection.timeout.ms", "6000");
>   props.setProperty("min.insync.replicas", "2");
>   props.setProperty("offsets.topic.replication.factor", "2");
>   props.setProperty("offsets.topic.num.partitions", "1");
>   props.setProperty("transaction.state.log.num.partitions", "2");
>   props.setProperty("transaction.state.log.replication.factor", "2");
>   props.setProperty("transaction.state.log.min.isr", "2");
> I am not getting any records in the consumer. When I set 
> isolation.level=read_uncommitted, I get the records. I assume that the 
> records are not getting commited. What could be the problem? log attached



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-10-13 Thread Francesco vigotti (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203331#comment-16203331
 ] 

Francesco vigotti commented on KAFKA-2729:
--

At the beginning of my cluster screw up I've got tons of zkVersion issue that's 
why I've posted here , but because seems that the problems for you goes away 
when you restarted your brokers maybe my problem is different.. 
kafka version : 0.10.2.1

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1, 0.9.0.0, 0.10.0.0, 0.10.1.0, 0.11.0.0
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-6052) Windows: Consumers not polling when isolation.level=read_committed

2017-10-13 Thread Ansel Zandegran (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203299#comment-16203299
 ] 

Ansel Zandegran edited comment on KAFKA-6052 at 10/13/17 9:48 AM:
--

Yes, the entire setup is running in windows not just the clients. I have 
attached the new TRACE level log. The log includes producer and consumer logs 
along with 3 brokers and zookeeper host. I have also added the TRACE level log 
for just the producer and consumer. Thanks. 


was (Author: zandegran):
Yes, the entire setup is running in windows not just the clients. I have 
attached the new TRACE level log. The log includes producer and consumer logs 
along with 3 brokers and zookeeper host. Thanks. 

> Windows: Consumers not polling when isolation.level=read_committed 
> ---
>
> Key: KAFKA-6052
> URL: https://issues.apache.org/jira/browse/KAFKA-6052
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer 
>Affects Versions: 0.11.0.0
> Environment: Windows 10. All processes running in embedded mode.
>Reporter: Ansel Zandegran
> Attachments: Prducer_Consumer.log, logFile.log
>
>
> *The same code is running fine in Linux.* I am trying to send a transactional 
> record with exactly once schematics. These are my producer, consumer and 
> broker setups. 
> public void sendWithTTemp(String topic, EHEvent event) {
> Properties props = new Properties();
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>   "localhost:9092,localhost:9093,localhost:9094");
> //props.put("bootstrap.servers", 
> "34.240.248.190:9092,52.50.95.30:9092,52.50.95.30:9092");
> props.put(ProducerConfig.ACKS_CONFIG, "all");
> props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
> props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1");
> props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
> props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
> props.put("transactional.id", "TID" + transactionId.incrementAndGet());
> props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "5000");
> Producer producer =
> new KafkaProducer<>(props,
> new StringSerializer(),
> new StringSerializer());
> Logger.log(this, "Initializing transaction...");
> producer.initTransactions();
> Logger.log(this, "Initializing done.");
> try {
>   Logger.log(this, "Begin transaction...");
>   producer.beginTransaction();
>   Logger.log(this, "Begin transaction done.");
>   Logger.log(this, "Sending events...");
>   producer.send(new ProducerRecord<>(topic,
>  event.getKey().toString(),
>  event.getValue().toString()));
>   Logger.log(this, "Sending events done.");
>   Logger.log(this, "Committing...");
>   producer.commitTransaction();
>   Logger.log(this, "Committing done.");
> } catch (ProducerFencedException | OutOfOrderSequenceException
> | AuthorizationException e) {
>   producer.close();
>   e.printStackTrace();
> } catch (KafkaException e) {
>   producer.abortTransaction();
>   e.printStackTrace();
> }
> producer.close();
>   }
> *In Consumer*
> I have set isolation.level=read_committed
> *In 3 Brokers*
> I'm running with the following properties
>   Properties props = new Properties();
>   props.setProperty("broker.id", "" + i);
>   props.setProperty("listeners", "PLAINTEXT://:909" + (2 + i));
>   props.setProperty("log.dirs", Configuration.KAFKA_DATA_PATH + "\\B" + 
> i);
>   props.setProperty("num.partitions", "1");
>   props.setProperty("zookeeper.connect", "localhost:2181");
>   props.setProperty("zookeeper.connection.timeout.ms", "6000");
>   props.setProperty("min.insync.replicas", "2");
>   props.setProperty("offsets.topic.replication.factor", "2");
>   props.setProperty("offsets.topic.num.partitions", "1");
>   props.setProperty("transaction.state.log.num.partitions", "2");
>   props.setProperty("transaction.state.log.replication.factor", "2");
>   props.setProperty("transaction.state.log.min.isr", "2");
> I am not getting any records in the consumer. When I set 
> isolation.level=read_uncommitted, I get the records. I assume that the 
> records are not getting commited. What could be the problem? log attached



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6052) Windows: Consumers not polling when isolation.level=read_committed

2017-10-13 Thread Ansel Zandegran (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ansel Zandegran updated KAFKA-6052:
---
Attachment: Prducer_Consumer.log

> Windows: Consumers not polling when isolation.level=read_committed 
> ---
>
> Key: KAFKA-6052
> URL: https://issues.apache.org/jira/browse/KAFKA-6052
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer 
>Affects Versions: 0.11.0.0
> Environment: Windows 10. All processes running in embedded mode.
>Reporter: Ansel Zandegran
> Attachments: Prducer_Consumer.log, logFile.log
>
>
> *The same code is running fine in Linux.* I am trying to send a transactional 
> record with exactly once schematics. These are my producer, consumer and 
> broker setups. 
> public void sendWithTTemp(String topic, EHEvent event) {
> Properties props = new Properties();
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>   "localhost:9092,localhost:9093,localhost:9094");
> //props.put("bootstrap.servers", 
> "34.240.248.190:9092,52.50.95.30:9092,52.50.95.30:9092");
> props.put(ProducerConfig.ACKS_CONFIG, "all");
> props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
> props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1");
> props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
> props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
> props.put("transactional.id", "TID" + transactionId.incrementAndGet());
> props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "5000");
> Producer producer =
> new KafkaProducer<>(props,
> new StringSerializer(),
> new StringSerializer());
> Logger.log(this, "Initializing transaction...");
> producer.initTransactions();
> Logger.log(this, "Initializing done.");
> try {
>   Logger.log(this, "Begin transaction...");
>   producer.beginTransaction();
>   Logger.log(this, "Begin transaction done.");
>   Logger.log(this, "Sending events...");
>   producer.send(new ProducerRecord<>(topic,
>  event.getKey().toString(),
>  event.getValue().toString()));
>   Logger.log(this, "Sending events done.");
>   Logger.log(this, "Committing...");
>   producer.commitTransaction();
>   Logger.log(this, "Committing done.");
> } catch (ProducerFencedException | OutOfOrderSequenceException
> | AuthorizationException e) {
>   producer.close();
>   e.printStackTrace();
> } catch (KafkaException e) {
>   producer.abortTransaction();
>   e.printStackTrace();
> }
> producer.close();
>   }
> *In Consumer*
> I have set isolation.level=read_committed
> *In 3 Brokers*
> I'm running with the following properties
>   Properties props = new Properties();
>   props.setProperty("broker.id", "" + i);
>   props.setProperty("listeners", "PLAINTEXT://:909" + (2 + i));
>   props.setProperty("log.dirs", Configuration.KAFKA_DATA_PATH + "\\B" + 
> i);
>   props.setProperty("num.partitions", "1");
>   props.setProperty("zookeeper.connect", "localhost:2181");
>   props.setProperty("zookeeper.connection.timeout.ms", "6000");
>   props.setProperty("min.insync.replicas", "2");
>   props.setProperty("offsets.topic.replication.factor", "2");
>   props.setProperty("offsets.topic.num.partitions", "1");
>   props.setProperty("transaction.state.log.num.partitions", "2");
>   props.setProperty("transaction.state.log.replication.factor", "2");
>   props.setProperty("transaction.state.log.min.isr", "2");
> I am not getting any records in the consumer. When I set 
> isolation.level=read_uncommitted, I get the records. I assume that the 
> records are not getting commited. What could be the problem? log attached



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6052) Windows: Consumers not polling when isolation.level=read_committed

2017-10-13 Thread Ansel Zandegran (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203299#comment-16203299
 ] 

Ansel Zandegran commented on KAFKA-6052:


Yes, the entire setup is running in windows not just the clients. I have 
attached the new TRACE level log. The log includes producer and consumer logs 
along with 3 brokers and zookeeper host. Thanks. 

> Windows: Consumers not polling when isolation.level=read_committed 
> ---
>
> Key: KAFKA-6052
> URL: https://issues.apache.org/jira/browse/KAFKA-6052
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer 
>Affects Versions: 0.11.0.0
> Environment: Windows 10. All processes running in embedded mode.
>Reporter: Ansel Zandegran
> Attachments: logFile.log
>
>
> *The same code is running fine in Linux.* I am trying to send a transactional 
> record with exactly once schematics. These are my producer, consumer and 
> broker setups. 
> public void sendWithTTemp(String topic, EHEvent event) {
> Properties props = new Properties();
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>   "localhost:9092,localhost:9093,localhost:9094");
> //props.put("bootstrap.servers", 
> "34.240.248.190:9092,52.50.95.30:9092,52.50.95.30:9092");
> props.put(ProducerConfig.ACKS_CONFIG, "all");
> props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
> props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1");
> props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
> props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
> props.put("transactional.id", "TID" + transactionId.incrementAndGet());
> props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "5000");
> Producer producer =
> new KafkaProducer<>(props,
> new StringSerializer(),
> new StringSerializer());
> Logger.log(this, "Initializing transaction...");
> producer.initTransactions();
> Logger.log(this, "Initializing done.");
> try {
>   Logger.log(this, "Begin transaction...");
>   producer.beginTransaction();
>   Logger.log(this, "Begin transaction done.");
>   Logger.log(this, "Sending events...");
>   producer.send(new ProducerRecord<>(topic,
>  event.getKey().toString(),
>  event.getValue().toString()));
>   Logger.log(this, "Sending events done.");
>   Logger.log(this, "Committing...");
>   producer.commitTransaction();
>   Logger.log(this, "Committing done.");
> } catch (ProducerFencedException | OutOfOrderSequenceException
> | AuthorizationException e) {
>   producer.close();
>   e.printStackTrace();
> } catch (KafkaException e) {
>   producer.abortTransaction();
>   e.printStackTrace();
> }
> producer.close();
>   }
> *In Consumer*
> I have set isolation.level=read_committed
> *In 3 Brokers*
> I'm running with the following properties
>   Properties props = new Properties();
>   props.setProperty("broker.id", "" + i);
>   props.setProperty("listeners", "PLAINTEXT://:909" + (2 + i));
>   props.setProperty("log.dirs", Configuration.KAFKA_DATA_PATH + "\\B" + 
> i);
>   props.setProperty("num.partitions", "1");
>   props.setProperty("zookeeper.connect", "localhost:2181");
>   props.setProperty("zookeeper.connection.timeout.ms", "6000");
>   props.setProperty("min.insync.replicas", "2");
>   props.setProperty("offsets.topic.replication.factor", "2");
>   props.setProperty("offsets.topic.num.partitions", "1");
>   props.setProperty("transaction.state.log.num.partitions", "2");
>   props.setProperty("transaction.state.log.replication.factor", "2");
>   props.setProperty("transaction.state.log.min.isr", "2");
> I am not getting any records in the consumer. When I set 
> isolation.level=read_uncommitted, I get the records. I assume that the 
> records are not getting commited. What could be the problem? log attached



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (KAFKA-6052) Windows: Consumers not polling when isolation.level=read_committed

2017-10-13 Thread Ansel Zandegran (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ansel Zandegran updated KAFKA-6052:
---
Comment: was deleted

(was: TRACE level log: 1 Zookeeper host, 3 brokers, 1 producer and 1 consumer)

> Windows: Consumers not polling when isolation.level=read_committed 
> ---
>
> Key: KAFKA-6052
> URL: https://issues.apache.org/jira/browse/KAFKA-6052
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer 
>Affects Versions: 0.11.0.0
> Environment: Windows 10. All processes running in embedded mode.
>Reporter: Ansel Zandegran
> Attachments: logFile.log
>
>
> *The same code is running fine in Linux.* I am trying to send a transactional 
> record with exactly once schematics. These are my producer, consumer and 
> broker setups. 
> public void sendWithTTemp(String topic, EHEvent event) {
> Properties props = new Properties();
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>   "localhost:9092,localhost:9093,localhost:9094");
> //props.put("bootstrap.servers", 
> "34.240.248.190:9092,52.50.95.30:9092,52.50.95.30:9092");
> props.put(ProducerConfig.ACKS_CONFIG, "all");
> props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
> props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1");
> props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
> props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
> props.put("transactional.id", "TID" + transactionId.incrementAndGet());
> props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "5000");
> Producer producer =
> new KafkaProducer<>(props,
> new StringSerializer(),
> new StringSerializer());
> Logger.log(this, "Initializing transaction...");
> producer.initTransactions();
> Logger.log(this, "Initializing done.");
> try {
>   Logger.log(this, "Begin transaction...");
>   producer.beginTransaction();
>   Logger.log(this, "Begin transaction done.");
>   Logger.log(this, "Sending events...");
>   producer.send(new ProducerRecord<>(topic,
>  event.getKey().toString(),
>  event.getValue().toString()));
>   Logger.log(this, "Sending events done.");
>   Logger.log(this, "Committing...");
>   producer.commitTransaction();
>   Logger.log(this, "Committing done.");
> } catch (ProducerFencedException | OutOfOrderSequenceException
> | AuthorizationException e) {
>   producer.close();
>   e.printStackTrace();
> } catch (KafkaException e) {
>   producer.abortTransaction();
>   e.printStackTrace();
> }
> producer.close();
>   }
> *In Consumer*
> I have set isolation.level=read_committed
> *In 3 Brokers*
> I'm running with the following properties
>   Properties props = new Properties();
>   props.setProperty("broker.id", "" + i);
>   props.setProperty("listeners", "PLAINTEXT://:909" + (2 + i));
>   props.setProperty("log.dirs", Configuration.KAFKA_DATA_PATH + "\\B" + 
> i);
>   props.setProperty("num.partitions", "1");
>   props.setProperty("zookeeper.connect", "localhost:2181");
>   props.setProperty("zookeeper.connection.timeout.ms", "6000");
>   props.setProperty("min.insync.replicas", "2");
>   props.setProperty("offsets.topic.replication.factor", "2");
>   props.setProperty("offsets.topic.num.partitions", "1");
>   props.setProperty("transaction.state.log.num.partitions", "2");
>   props.setProperty("transaction.state.log.replication.factor", "2");
>   props.setProperty("transaction.state.log.min.isr", "2");
> I am not getting any records in the consumer. When I set 
> isolation.level=read_uncommitted, I get the records. I assume that the 
> records are not getting commited. What could be the problem? log attached



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6052) Windows: Consumers not polling when isolation.level=read_committed

2017-10-13 Thread Ansel Zandegran (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ansel Zandegran updated KAFKA-6052:
---
Attachment: (was: logFile.log)

> Windows: Consumers not polling when isolation.level=read_committed 
> ---
>
> Key: KAFKA-6052
> URL: https://issues.apache.org/jira/browse/KAFKA-6052
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer 
>Affects Versions: 0.11.0.0
> Environment: Windows 10. All processes running in embedded mode.
>Reporter: Ansel Zandegran
> Attachments: logFile.log
>
>
> *The same code is running fine in Linux.* I am trying to send a transactional 
> record with exactly once schematics. These are my producer, consumer and 
> broker setups. 
> public void sendWithTTemp(String topic, EHEvent event) {
> Properties props = new Properties();
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>   "localhost:9092,localhost:9093,localhost:9094");
> //props.put("bootstrap.servers", 
> "34.240.248.190:9092,52.50.95.30:9092,52.50.95.30:9092");
> props.put(ProducerConfig.ACKS_CONFIG, "all");
> props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
> props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1");
> props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
> props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
> props.put("transactional.id", "TID" + transactionId.incrementAndGet());
> props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "5000");
> Producer producer =
> new KafkaProducer<>(props,
> new StringSerializer(),
> new StringSerializer());
> Logger.log(this, "Initializing transaction...");
> producer.initTransactions();
> Logger.log(this, "Initializing done.");
> try {
>   Logger.log(this, "Begin transaction...");
>   producer.beginTransaction();
>   Logger.log(this, "Begin transaction done.");
>   Logger.log(this, "Sending events...");
>   producer.send(new ProducerRecord<>(topic,
>  event.getKey().toString(),
>  event.getValue().toString()));
>   Logger.log(this, "Sending events done.");
>   Logger.log(this, "Committing...");
>   producer.commitTransaction();
>   Logger.log(this, "Committing done.");
> } catch (ProducerFencedException | OutOfOrderSequenceException
> | AuthorizationException e) {
>   producer.close();
>   e.printStackTrace();
> } catch (KafkaException e) {
>   producer.abortTransaction();
>   e.printStackTrace();
> }
> producer.close();
>   }
> *In Consumer*
> I have set isolation.level=read_committed
> *In 3 Brokers*
> I'm running with the following properties
>   Properties props = new Properties();
>   props.setProperty("broker.id", "" + i);
>   props.setProperty("listeners", "PLAINTEXT://:909" + (2 + i));
>   props.setProperty("log.dirs", Configuration.KAFKA_DATA_PATH + "\\B" + 
> i);
>   props.setProperty("num.partitions", "1");
>   props.setProperty("zookeeper.connect", "localhost:2181");
>   props.setProperty("zookeeper.connection.timeout.ms", "6000");
>   props.setProperty("min.insync.replicas", "2");
>   props.setProperty("offsets.topic.replication.factor", "2");
>   props.setProperty("offsets.topic.num.partitions", "1");
>   props.setProperty("transaction.state.log.num.partitions", "2");
>   props.setProperty("transaction.state.log.replication.factor", "2");
>   props.setProperty("transaction.state.log.min.isr", "2");
> I am not getting any records in the consumer. When I set 
> isolation.level=read_uncommitted, I get the records. I assume that the 
> records are not getting commited. What could be the problem? log attached



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6052) Windows: Consumers not polling when isolation.level=read_committed

2017-10-13 Thread Ansel Zandegran (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ansel Zandegran updated KAFKA-6052:
---
Attachment: logFile.log

TRACE level log: 1 Zookeeper host, 3 brokers, 1 producer and 1 consumer

> Windows: Consumers not polling when isolation.level=read_committed 
> ---
>
> Key: KAFKA-6052
> URL: https://issues.apache.org/jira/browse/KAFKA-6052
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer 
>Affects Versions: 0.11.0.0
> Environment: Windows 10. All processes running in embedded mode.
>Reporter: Ansel Zandegran
> Attachments: logFile.log
>
>
> *The same code is running fine in Linux.* I am trying to send a transactional 
> record with exactly once schematics. These are my producer, consumer and 
> broker setups. 
> public void sendWithTTemp(String topic, EHEvent event) {
> Properties props = new Properties();
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
>   "localhost:9092,localhost:9093,localhost:9094");
> //props.put("bootstrap.servers", 
> "34.240.248.190:9092,52.50.95.30:9092,52.50.95.30:9092");
> props.put(ProducerConfig.ACKS_CONFIG, "all");
> props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
> props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1");
> props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
> props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
> props.put("transactional.id", "TID" + transactionId.incrementAndGet());
> props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "5000");
> Producer producer =
> new KafkaProducer<>(props,
> new StringSerializer(),
> new StringSerializer());
> Logger.log(this, "Initializing transaction...");
> producer.initTransactions();
> Logger.log(this, "Initializing done.");
> try {
>   Logger.log(this, "Begin transaction...");
>   producer.beginTransaction();
>   Logger.log(this, "Begin transaction done.");
>   Logger.log(this, "Sending events...");
>   producer.send(new ProducerRecord<>(topic,
>  event.getKey().toString(),
>  event.getValue().toString()));
>   Logger.log(this, "Sending events done.");
>   Logger.log(this, "Committing...");
>   producer.commitTransaction();
>   Logger.log(this, "Committing done.");
> } catch (ProducerFencedException | OutOfOrderSequenceException
> | AuthorizationException e) {
>   producer.close();
>   e.printStackTrace();
> } catch (KafkaException e) {
>   producer.abortTransaction();
>   e.printStackTrace();
> }
> producer.close();
>   }
> *In Consumer*
> I have set isolation.level=read_committed
> *In 3 Brokers*
> I'm running with the following properties
>   Properties props = new Properties();
>   props.setProperty("broker.id", "" + i);
>   props.setProperty("listeners", "PLAINTEXT://:909" + (2 + i));
>   props.setProperty("log.dirs", Configuration.KAFKA_DATA_PATH + "\\B" + 
> i);
>   props.setProperty("num.partitions", "1");
>   props.setProperty("zookeeper.connect", "localhost:2181");
>   props.setProperty("zookeeper.connection.timeout.ms", "6000");
>   props.setProperty("min.insync.replicas", "2");
>   props.setProperty("offsets.topic.replication.factor", "2");
>   props.setProperty("offsets.topic.num.partitions", "1");
>   props.setProperty("transaction.state.log.num.partitions", "2");
>   props.setProperty("transaction.state.log.replication.factor", "2");
>   props.setProperty("transaction.state.log.min.isr", "2");
> I am not getting any records in the consumer. When I set 
> isolation.level=read_uncommitted, I get the records. I assume that the 
> records are not getting commited. What could be the problem? log attached



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-10-13 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203244#comment-16203244
 ] 

Ismael Juma commented on KAFKA-2729:


[~fravigotti], none of your log messages seems to be about the zkVersion issue, 
is it really the same issue as this one? If not, you should file a separate 
JIRA including the Kafka version.

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1, 0.9.0.0, 0.10.0.0, 0.10.1.0, 0.11.0.0
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-10-13 Thread Francesco vigotti (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203198#comment-16203198
 ] 

Francesco vigotti commented on KAFKA-2729:
--

I'm having the same issue and definitely losing trust in kafka, every 2 months 
there is something that force me to reset the whole cluster, I'm searching for 
a good alternative for a distributed-persisted-fast-queue for a while.. yet to 
find something that give me a good vibe.. 

anyway I'm facing this same issue with some small differences
- restarting all brokers ( together and rolling-restart ) didn't fix it..

all brokers in the cluster log such errors :
--- broker 5 

{code:java}

[2017-10-13 08:13:57,429] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[__consumer_offsets,17] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-10-13 08:13:57,429] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[__consumer_offsets,23] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-10-13 08:13:57,429] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[__consumer_offsets,47] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-10-13 08:13:57,429] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[__consumer_offsets,29] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)

{code}

--- broker3

)
{code:java}

[2017-10-13 08:13:58,547] INFO Partition [__consumer_offsets,20] on broker 3: 
Expanding ISR for partition __consumer_offsets-20 from 3,2 to 3,2,5 
(kafka.cluster.Partition)
[2017-10-13 08:13:58,551] INFO Partition [__consumer_offsets,44] on broker 3: 
Expanding ISR for partition __consumer_offsets-44 from 3,2 to 3,2,5 
(kafka.cluster.Partition)
[2017-10-13 08:13:58,554] INFO Partition [__consumer_offsets,5] on broker 3: 
Expanding ISR for partition __consumer_offsets-5 from 2,3 to 2,3,5 
(kafka.cluster.Partition)
[2017-10-13 08:13:58,557] INFO Partition [__consumer_offsets,26] on broker 3: 
Expanding ISR for partition __consumer_offsets-26 from 3,2 to 3,2,5 
(kafka.cluster.Partition)
[2017-10-13 08:13:58,563] INFO Partition [__consumer_offsets,29] on broker 3: 
Expanding ISR for partition __consumer_offsets-29 from 2,3 to 2,3,5 
(kafka.cluster.Partition)
[2017-10-13 08:13:58,566] INFO Partition [__consumer_offsets,32] on broker 3: 
Expanding ISR for partition __consumer_offsets-32 from 3,2 to 3,2,5 
(kafka.cluster.Partition)
[2017-10-13 08:13:58,570] INFO Partition [legacyJavaVarT,2] on broker 3: 
Expanding ISR for partition legacyJavaVarT-2 from 3 to 3,5 
(kafka.cluster.Partition)
[2017-10-13 08:13:58,573] INFO Partition [test4,3] on broker 3: Expanding ISR 
for partition test4-3 from 2,3 to 2,3,5 (kafka.cluster.Partition)
[2017-10-13 08:13:58,577] INFO Partition [test4,0] on broker 3: Expanding ISR 
for partition test4-0 from 3,2 to 3,2,5 (kafka.cluster.Partition)
[2017-10-13 08:13:58,582] INFO Partition [test3,5] on broker 3: Expanding ISR 
for partition test3-5 from 3 to 3,5 (kafka.cluster.Partition)

{code}


--- broker2 

{code:java}

[2017-10-13 08:13:36,289] INFO Partition [__consumer_offsets,11] on broker 2: 
Expanding ISR for partition __consumer_offsets-11 from 2,5 to 2,5,3 
(kafka.cluster.Partition)
[2017-10-13 08:13:36,293] INFO Partition [__consumer_offsets,41] on broker 2: 
Expanding ISR for partition __consumer_offsets-41 from 2,5 to 2,5,3 
(kafka.cluster.Partition)
[2017-10-13 08:13:36,296] INFO Partition [test3,2] on broker 2: Expanding ISR 
for partition test3-2 from 2 to 2,3 (kafka.cluster.Partition)
[2017-10-13 08:13:36,300] INFO Partition [__consumer_offsets,23] on broker 2: 
Expanding ISR for partition __consumer_offsets-23 from 2,5 to 2,5,3 
(kafka.cluster.Partition)
[2017-10-13 08:13:36,304] INFO Partition [__consumer_offsets,5] on broker 2: 
Expanding ISR for partition __consumer_offsets-5 from 2,5 to 2,5,3 
(kafka.cluster.Partition)
[2017-10-13 08:13:36,337] INFO Partition [__consumer_offsets,35] on broker 2: 
Expanding ISR for partition __consumer_offsets-35 from 2,5 to 2,5,3 
(kafka.cluster.Partition)
[2017-10-13 08:13:36,372] INFO Partition [test_mainlog,24] on broker 2: 
Expanding ISR for partition test_mainlog-24 from 2 to 2,3 
(kafka.cluster.Partition)
[2017-10-13 08:13:36,375] INFO Partition [test_mainlog,6] on broker 2: 
Expanding ISR for partition test_mainlog-6 from 2 to 2,3 
(kafka.cluster.Partition)
[2017-10-13 08:13:36,379] INFO Partition [test_mainlog,18] on broker 2: 
Expanding ISR for partition test_mainlog-18