[jira] [Updated] (KAFKA-772) System Test Transient Failure on testcase_0122
[ https://issues.apache.org/jira/browse/KAFKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-772: Attachment: testcase_0125.tar.gz System Test Transient Failure on testcase_0122 -- Key: KAFKA-772 URL: https://issues.apache.org/jira/browse/KAFKA-772 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: John Fung Assignee: Sriram Subramanian Labels: kafka-0.8, p1 Attachments: testcase_0122.tar.gz, testcase_0125.tar.gz * This test case is failing randomly in the past few weeks. Please note there is a small % data loss allowance for the test case with Ack = 1. But the failure in this case is the mismatch of log segment checksum across the replicas. * Test description: 3 brokers cluster Replication factor = 3 No. topic = 2 No. partitions = 3 Controlled failure (kill -15) Ack = 1 * Test case output _test_case_name : testcase_0122 _test_class_name : ReplicaBasicTest arg : auto_create_topic : true arg : bounce_broker : true arg : broker_type : leader arg : message_producing_free_time_sec : 15 arg : num_iteration : 3 arg : num_partition : 3 arg : replica_factor : 3 arg : sleep_seconds_between_producer_calls : 1 validation_status : Leader Election Latency - iter 1 brokerid 3 : 377.00 ms Leader Election Latency - iter 2 brokerid 1 : 374.00 ms Leader Election Latency - iter 3 brokerid 2 : 384.00 ms Leader Election Latency MAX : 384.00 Leader Election Latency MIN : 374.00 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r1.log : 1500 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r2.log : 1500 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r3.log : 1500 Unique messages from consumer on [test_2] : 5000 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r1.log : 1714 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r2.log : 1714 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r3.log : 1680 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r1.log : 1708 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r2.log : 1708 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r3.log : 1708 Unique messages from consumer on [test_2] at simple_consumer_test_2-2_r1.log : 1469 Unique messages from consumer on [test_2] at simple_consumer_test_2-2_r2.log : 1469 Unique messages from consumer on [test_2] at simple_consumer_test_2-2_r3.log : 1469 Unique messages from producer on [test_2] : 4900 Validate for data matched on topic [test_1] across replicas : PASSED Validate for data matched on topic [test_2] : FAILED Validate for data matched on topic [test_2] across replicas : FAILED Validate for merged log segment checksum in cluster [source] : FAILED Validate leader election successful : PASSED -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-772) System Test Transient Failure on testcase_0122
[ https://issues.apache.org/jira/browse/KAFKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-772: Attachment: (was: testcase_0125.tar.gz) System Test Transient Failure on testcase_0122 -- Key: KAFKA-772 URL: https://issues.apache.org/jira/browse/KAFKA-772 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: John Fung Assignee: Sriram Subramanian Labels: kafka-0.8, p1 Attachments: testcase_0122.tar.gz * This test case is failing randomly in the past few weeks. Please note there is a small % data loss allowance for the test case with Ack = 1. But the failure in this case is the mismatch of log segment checksum across the replicas. * Test description: 3 brokers cluster Replication factor = 3 No. topic = 2 No. partitions = 3 Controlled failure (kill -15) Ack = 1 * Test case output _test_case_name : testcase_0122 _test_class_name : ReplicaBasicTest arg : auto_create_topic : true arg : bounce_broker : true arg : broker_type : leader arg : message_producing_free_time_sec : 15 arg : num_iteration : 3 arg : num_partition : 3 arg : replica_factor : 3 arg : sleep_seconds_between_producer_calls : 1 validation_status : Leader Election Latency - iter 1 brokerid 3 : 377.00 ms Leader Election Latency - iter 2 brokerid 1 : 374.00 ms Leader Election Latency - iter 3 brokerid 2 : 384.00 ms Leader Election Latency MAX : 384.00 Leader Election Latency MIN : 374.00 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r1.log : 1500 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r2.log : 1500 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r3.log : 1500 Unique messages from consumer on [test_2] : 5000 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r1.log : 1714 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r2.log : 1714 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r3.log : 1680 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r1.log : 1708 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r2.log : 1708 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r3.log : 1708 Unique messages from consumer on [test_2] at simple_consumer_test_2-2_r1.log : 1469 Unique messages from consumer on [test_2] at simple_consumer_test_2-2_r2.log : 1469 Unique messages from consumer on [test_2] at simple_consumer_test_2-2_r3.log : 1469 Unique messages from producer on [test_2] : 4900 Validate for data matched on topic [test_1] across replicas : PASSED Validate for data matched on topic [test_2] : FAILED Validate for data matched on topic [test_2] across replicas : FAILED Validate for merged log segment checksum in cluster [source] : FAILED Validate leader election successful : PASSED -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-772) System Test Transient Failure on testcase_0122
[ https://issues.apache.org/jira/browse/KAFKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-772: Attachment: testcase_0125.tar.gz System Test Transient Failure on testcase_0122 -- Key: KAFKA-772 URL: https://issues.apache.org/jira/browse/KAFKA-772 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: John Fung Assignee: Sriram Subramanian Labels: kafka-0.8, p1 Attachments: testcase_0122.tar.gz, testcase_0125.tar.gz * This test case is failing randomly in the past few weeks. Please note there is a small % data loss allowance for the test case with Ack = 1. But the failure in this case is the mismatch of log segment checksum across the replicas. * Test description: 3 brokers cluster Replication factor = 3 No. topic = 2 No. partitions = 3 Controlled failure (kill -15) Ack = 1 * Test case output _test_case_name : testcase_0122 _test_class_name : ReplicaBasicTest arg : auto_create_topic : true arg : bounce_broker : true arg : broker_type : leader arg : message_producing_free_time_sec : 15 arg : num_iteration : 3 arg : num_partition : 3 arg : replica_factor : 3 arg : sleep_seconds_between_producer_calls : 1 validation_status : Leader Election Latency - iter 1 brokerid 3 : 377.00 ms Leader Election Latency - iter 2 brokerid 1 : 374.00 ms Leader Election Latency - iter 3 brokerid 2 : 384.00 ms Leader Election Latency MAX : 384.00 Leader Election Latency MIN : 374.00 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 1750 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r1.log : 1500 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r2.log : 1500 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r3.log : 1500 Unique messages from consumer on [test_2] : 5000 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r1.log : 1714 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r2.log : 1714 Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r3.log : 1680 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r1.log : 1708 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r2.log : 1708 Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r3.log : 1708 Unique messages from consumer on [test_2] at simple_consumer_test_2-2_r1.log : 1469 Unique messages from consumer on [test_2] at simple_consumer_test_2-2_r2.log : 1469 Unique messages from consumer on [test_2] at simple_consumer_test_2-2_r3.log : 1469 Unique messages from producer on [test_2] : 4900 Validate for data matched on topic [test_1] across replicas : PASSED Validate for data matched on topic [test_2] : FAILED Validate for data matched on topic [test_2] across replicas : FAILED Validate for merged log segment checksum in cluster [source] : FAILED Validate leader election successful : PASSED -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-759) Commit/FetchOffset APIs should not return versionId
[ https://issues.apache.org/jira/browse/KAFKA-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590973#comment-13590973 ] Neha Narkhede commented on KAFKA-759: - +1, looks good Commit/FetchOffset APIs should not return versionId --- Key: KAFKA-759 URL: https://issues.apache.org/jira/browse/KAFKA-759 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8 Reporter: David Arthur Assignee: David Arthur Priority: Minor Fix For: 0.8 Attachments: 0001-KAFKA-759-Remove-versionId-from-OffsetCommitResponse.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-759) Commit/FetchOffset APIs should not return versionId
[ https://issues.apache.org/jira/browse/KAFKA-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-759: Resolution: Fixed Fix Version/s: (was: 0.8) 0.8.1 Status: Resolved (was: Patch Available) Checked into trunk Commit/FetchOffset APIs should not return versionId --- Key: KAFKA-759 URL: https://issues.apache.org/jira/browse/KAFKA-759 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8 Reporter: David Arthur Assignee: David Arthur Priority: Minor Fix For: 0.8.1 Attachments: 0001-KAFKA-759-Remove-versionId-from-OffsetCommitResponse.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-777) Add system tests for important tools
[ https://issues.apache.org/jira/browse/KAFKA-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-777: Labels: kafka-0.8 p2 replication-testing (was: kafka-0.8 p1 replication-testing) Add system tests for important tools Key: KAFKA-777 URL: https://issues.apache.org/jira/browse/KAFKA-777 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Sriram Subramanian Assignee: John Fung Labels: kafka-0.8, p2, replication-testing Fix For: 0.8 Few tools were broken after the zk format change. It would be great to catch these issues during system tests. Some of the tools are 1. ShudownBroker 2. PreferredReplicaAssignment 3. ConsumerOffsetChecker There might be a few more for which we need tests. Need to add them once identified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-513) Add state change log to Kafka brokers
[ https://issues.apache.org/jira/browse/KAFKA-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swapnil Ghike updated KAFKA-513: Attachment: kafka-513-v4.patch Uploading patch v4. The comments on the merge tool are written above. Remaining comments - A. Logging in kafka - 11. tried to make the log format more consistent - used [%s,%d] for printing topicAndPartition, logged correlationId when the logging statement was concerned with leaderAndIsrRequest, printed controllerId and controllerEpoch wherever needed. I have tried to ensure that there are no mistakes in the order of parameters to .format() etc, but it would be helpful if you could also scrutinize them as well. 12. Changed state change log logging level to trace, except for certain errors. 13. As you mentioned, it's an excellent idea to mention in ControllerChannelManager whether the leaderAndIsr request is become-leader or become-follower request. Added a change to log that. Added a change to do that on the broker in ReplicaManager.{makeLeader,makeFollower}. 14. Included correlationId in LeaderAndIsrRequest.toString 15. Changed PartitionStateChangeLogger to StateChangeLogger everywhere. 16. Fixed the mentioned typos. 17. Added the state change log entries that you suggested. 18. As discussed offline, kept the wrapper class around Utils.Logging. Providing a logIdent to this class will save us the trouble to specify the broker id in every state change log entry, and it will keep the logging consistent with the regular server logging. 19. The successful lifecycle of a state change request will look like the following (error/discard/abort messages can be included in this sequence in case of failures) - On the controller - Controller %d, epoch %d sending become-leader/follower LeaderAndIsr request with correlationId %d to broker %d for partition [%s,%d] On a broker - [Replica Manager on Broker %d]: Handling LeaderAndIsr request correlationId %d received from controller %d epoch %d for partition [%s,%d] On the same broker - [Replica Manager on Broker %d]: LeaderAndIsr request correlationId %d received from controller %d epoch %d starting the become-leader/follower transition for partition [%s,%d] On the same broker - [Replica Manager on Broker %d]: Completed become-leader/follower transition for partition [%s,%d] On the same broker - [Replica Manager on Broker %d]: Handled LeaderAndIsr request correlationId %d received from controller %d epoch %d for partition [%s,%d] On the controller - Controller %d received response correlationId %d for a request sent to broker %d I think KafkaApis is not the right place to include a received leaderAndIsr request log entry. since KafkaApis handles all types of requests. We should rather expect to directly see a Handling LeaderAndIsr request log entry. On the broker, we don't know whether the received leaderAndIsr request is a become-leader or become-follower request, but I guess it's ok since we log that information in the second statement as seen above and while doing that we log correlationId. B. I included correlationId in the abstract class RequestOrResponse, probably all its derived classes should include a correlationId. C. config/log4j.properties now uses a separate controller.log and a separate state-change.log. D. Our kafka-run-class.sh script removes quotes passed to the command line arguments. Changed it so that the quotes can be passed as such. It's useful for passing values containing whitespaces like 2013-03-01 16:03:43,093. Add state change log to Kafka brokers - Key: KAFKA-513 URL: https://issues.apache.org/jira/browse/KAFKA-513 Project: Kafka Issue Type: Sub-task Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Swapnil Ghike Priority: Blocker Labels: p1, replication, tools Fix For: 0.8 Attachments: kafka-513-v1.patch, kafka-513-v2.patch, kafka-513-v3.patch, kafka-513-v4.patch Original Estimate: 96h Remaining Estimate: 96h Once KAFKA-499 is checked in, every controller to broker communication can be modelled as a state change for one or more partitions. Every state change request will carry the controller epoch. If there is a problem with the state of some partitions, it will be good to have a tool that can create a timeline of requested and completed state changes. This will require each broker to output a state change log that has entries like [2012-09-10 10:06:17,280] broker 1 received request LeaderAndIsr() for partition [foo, 0] from controller 2, epoch 1 [2012-09-10 10:06:17,350] broker 1 completed request LeaderAndIsr() for partition [foo, 0] from controller 2, epoch 1 On controller, this will look like - [2012-09-10 10:06:17,198] controller 2, epoch 1, initiated
[jira] [Created] (KAFKA-779) Standardize Zk data structures for Re-assign partitions and Preferred replication election
Swapnil Ghike created KAFKA-779: --- Summary: Standardize Zk data structures for Re-assign partitions and Preferred replication election Key: KAFKA-779 URL: https://issues.apache.org/jira/browse/KAFKA-779 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Swapnil Ghike Assignee: Swapnil Ghike Priority: Blocker Fix For: 0.8 Follow the schema at https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Flume-NG Kafka Channel
I started hacking on a Kafka Channel for Flume-NG. It has a Java Consumer and Java Producer based on the java example code that works with the PUT and TAKE methods on the Channel Interface. Any thoughts? Kafka makes a really powerful addition to the existing channel options. -Jonathan -- ** *Jonathan Creasy* | Sr. Ops Engineer e: j...@box.com | t: 314.580.8909