[jira] [Updated] (KAFKA-772) System Test Transient Failure on testcase_0122

2013-03-01 Thread John Fung (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Fung updated KAFKA-772:


Attachment: testcase_0125.tar.gz

 System Test Transient Failure on testcase_0122
 --

 Key: KAFKA-772
 URL: https://issues.apache.org/jira/browse/KAFKA-772
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: John Fung
Assignee: Sriram Subramanian
  Labels: kafka-0.8, p1
 Attachments: testcase_0122.tar.gz, testcase_0125.tar.gz


 * This test case is failing randomly in the past few weeks. Please note there 
 is a small % data loss allowance for the test case with Ack = 1. But the 
 failure in this case is the mismatch of log segment checksum across the 
 replicas.
 * Test description:
 3 brokers cluster
 Replication factor = 3
 No. topic = 2
 No. partitions = 3
 Controlled failure (kill -15)
 Ack = 1
 * Test case output
 _test_case_name  :  testcase_0122
 _test_class_name  :  ReplicaBasicTest
 arg : auto_create_topic  :  true
 arg : bounce_broker  :  true
 arg : broker_type  :  leader
 arg : message_producing_free_time_sec  :  15
 arg : num_iteration  :  3
 arg : num_partition  :  3
 arg : replica_factor  :  3
 arg : sleep_seconds_between_producer_calls  :  1
 validation_status  : 
  Leader Election Latency - iter 1 brokerid 3  :  377.00 ms
  Leader Election Latency - iter 2 brokerid 1  :  374.00 ms
  Leader Election Latency - iter 3 brokerid 2  :  384.00 ms
  Leader Election Latency MAX  :  384.00
  Leader Election Latency MIN  :  374.00
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-0_r1.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-0_r2.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-0_r3.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-1_r1.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-1_r2.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-1_r3.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-2_r1.log  :  1500
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-2_r2.log  :  1500
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-2_r3.log  :  1500
  Unique messages from consumer on [test_2]  :  5000
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-0_r1.log  :  1714
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-0_r2.log  :  1714
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-0_r3.log  :  1680
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-1_r1.log  :  1708
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-1_r2.log  :  1708
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-1_r3.log  :  1708
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-2_r1.log  :  1469
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-2_r2.log  :  1469
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-2_r3.log  :  1469
  Unique messages from producer on [test_2]  :  4900
  Validate for data matched on topic [test_1] across replicas  :  PASSED
  Validate for data matched on topic [test_2]  :  FAILED
  Validate for data matched on topic [test_2] across replicas  :  FAILED
  Validate for merged log segment checksum in cluster [source]  :  FAILED
  Validate leader election successful  :  PASSED

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-772) System Test Transient Failure on testcase_0122

2013-03-01 Thread John Fung (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Fung updated KAFKA-772:


Attachment: (was: testcase_0125.tar.gz)

 System Test Transient Failure on testcase_0122
 --

 Key: KAFKA-772
 URL: https://issues.apache.org/jira/browse/KAFKA-772
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: John Fung
Assignee: Sriram Subramanian
  Labels: kafka-0.8, p1
 Attachments: testcase_0122.tar.gz


 * This test case is failing randomly in the past few weeks. Please note there 
 is a small % data loss allowance for the test case with Ack = 1. But the 
 failure in this case is the mismatch of log segment checksum across the 
 replicas.
 * Test description:
 3 brokers cluster
 Replication factor = 3
 No. topic = 2
 No. partitions = 3
 Controlled failure (kill -15)
 Ack = 1
 * Test case output
 _test_case_name  :  testcase_0122
 _test_class_name  :  ReplicaBasicTest
 arg : auto_create_topic  :  true
 arg : bounce_broker  :  true
 arg : broker_type  :  leader
 arg : message_producing_free_time_sec  :  15
 arg : num_iteration  :  3
 arg : num_partition  :  3
 arg : replica_factor  :  3
 arg : sleep_seconds_between_producer_calls  :  1
 validation_status  : 
  Leader Election Latency - iter 1 brokerid 3  :  377.00 ms
  Leader Election Latency - iter 2 brokerid 1  :  374.00 ms
  Leader Election Latency - iter 3 brokerid 2  :  384.00 ms
  Leader Election Latency MAX  :  384.00
  Leader Election Latency MIN  :  374.00
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-0_r1.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-0_r2.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-0_r3.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-1_r1.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-1_r2.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-1_r3.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-2_r1.log  :  1500
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-2_r2.log  :  1500
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-2_r3.log  :  1500
  Unique messages from consumer on [test_2]  :  5000
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-0_r1.log  :  1714
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-0_r2.log  :  1714
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-0_r3.log  :  1680
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-1_r1.log  :  1708
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-1_r2.log  :  1708
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-1_r3.log  :  1708
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-2_r1.log  :  1469
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-2_r2.log  :  1469
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-2_r3.log  :  1469
  Unique messages from producer on [test_2]  :  4900
  Validate for data matched on topic [test_1] across replicas  :  PASSED
  Validate for data matched on topic [test_2]  :  FAILED
  Validate for data matched on topic [test_2] across replicas  :  FAILED
  Validate for merged log segment checksum in cluster [source]  :  FAILED
  Validate leader election successful  :  PASSED

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-772) System Test Transient Failure on testcase_0122

2013-03-01 Thread John Fung (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Fung updated KAFKA-772:


Attachment: testcase_0125.tar.gz

 System Test Transient Failure on testcase_0122
 --

 Key: KAFKA-772
 URL: https://issues.apache.org/jira/browse/KAFKA-772
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: John Fung
Assignee: Sriram Subramanian
  Labels: kafka-0.8, p1
 Attachments: testcase_0122.tar.gz, testcase_0125.tar.gz


 * This test case is failing randomly in the past few weeks. Please note there 
 is a small % data loss allowance for the test case with Ack = 1. But the 
 failure in this case is the mismatch of log segment checksum across the 
 replicas.
 * Test description:
 3 brokers cluster
 Replication factor = 3
 No. topic = 2
 No. partitions = 3
 Controlled failure (kill -15)
 Ack = 1
 * Test case output
 _test_case_name  :  testcase_0122
 _test_class_name  :  ReplicaBasicTest
 arg : auto_create_topic  :  true
 arg : bounce_broker  :  true
 arg : broker_type  :  leader
 arg : message_producing_free_time_sec  :  15
 arg : num_iteration  :  3
 arg : num_partition  :  3
 arg : replica_factor  :  3
 arg : sleep_seconds_between_producer_calls  :  1
 validation_status  : 
  Leader Election Latency - iter 1 brokerid 3  :  377.00 ms
  Leader Election Latency - iter 2 brokerid 1  :  374.00 ms
  Leader Election Latency - iter 3 brokerid 2  :  384.00 ms
  Leader Election Latency MAX  :  384.00
  Leader Election Latency MIN  :  374.00
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-0_r1.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-0_r2.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-0_r3.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-1_r1.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-1_r2.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-1_r3.log  :  1750
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-2_r1.log  :  1500
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-2_r2.log  :  1500
  Unique messages from consumer on [test_1] at 
 simple_consumer_test_1-2_r3.log  :  1500
  Unique messages from consumer on [test_2]  :  5000
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-0_r1.log  :  1714
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-0_r2.log  :  1714
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-0_r3.log  :  1680
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-1_r1.log  :  1708
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-1_r2.log  :  1708
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-1_r3.log  :  1708
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-2_r1.log  :  1469
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-2_r2.log  :  1469
  Unique messages from consumer on [test_2] at 
 simple_consumer_test_2-2_r3.log  :  1469
  Unique messages from producer on [test_2]  :  4900
  Validate for data matched on topic [test_1] across replicas  :  PASSED
  Validate for data matched on topic [test_2]  :  FAILED
  Validate for data matched on topic [test_2] across replicas  :  FAILED
  Validate for merged log segment checksum in cluster [source]  :  FAILED
  Validate leader election successful  :  PASSED

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-759) Commit/FetchOffset APIs should not return versionId

2013-03-01 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590973#comment-13590973
 ] 

Neha Narkhede commented on KAFKA-759:
-

+1, looks good

 Commit/FetchOffset APIs should not return versionId
 ---

 Key: KAFKA-759
 URL: https://issues.apache.org/jira/browse/KAFKA-759
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
Reporter: David Arthur
Assignee: David Arthur
Priority: Minor
 Fix For: 0.8

 Attachments: 
 0001-KAFKA-759-Remove-versionId-from-OffsetCommitResponse.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-759) Commit/FetchOffset APIs should not return versionId

2013-03-01 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-759:


   Resolution: Fixed
Fix Version/s: (was: 0.8)
   0.8.1
   Status: Resolved  (was: Patch Available)

Checked into trunk

 Commit/FetchOffset APIs should not return versionId
 ---

 Key: KAFKA-759
 URL: https://issues.apache.org/jira/browse/KAFKA-759
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
Reporter: David Arthur
Assignee: David Arthur
Priority: Minor
 Fix For: 0.8.1

 Attachments: 
 0001-KAFKA-759-Remove-versionId-from-OffsetCommitResponse.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-777) Add system tests for important tools

2013-03-01 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-777:


Labels: kafka-0.8 p2 replication-testing  (was: kafka-0.8 p1 
replication-testing)

 Add system tests for important tools
 

 Key: KAFKA-777
 URL: https://issues.apache.org/jira/browse/KAFKA-777
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Sriram Subramanian
Assignee: John Fung
  Labels: kafka-0.8, p2, replication-testing
 Fix For: 0.8


 Few tools were broken after the zk format change. It would be great to catch 
 these issues during system tests. Some of the tools are 
 1. ShudownBroker
 2. PreferredReplicaAssignment
 3. ConsumerOffsetChecker
 There might be a few more for which we need tests. Need to add them once 
 identified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (KAFKA-513) Add state change log to Kafka brokers

2013-03-01 Thread Swapnil Ghike (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swapnil Ghike updated KAFKA-513:


Attachment: kafka-513-v4.patch

Uploading patch v4. The comments on the merge tool are written above. Remaining 
comments - 

A. Logging in kafka - 
11. tried to make the log format more consistent - used [%s,%d] for printing 
topicAndPartition, logged correlationId when the logging statement was 
concerned with leaderAndIsrRequest, printed controllerId and controllerEpoch 
wherever needed. I have tried to ensure that there are no mistakes in the order 
of parameters to .format() etc, but it would be helpful if you could also 
scrutinize them as well.
12. Changed state change log logging level to trace, except for certain errors.
13. As you mentioned, it's an excellent idea to mention in 
ControllerChannelManager whether the leaderAndIsr request is become-leader or 
become-follower request. Added a change to log that. Added a change to do that 
on the broker in ReplicaManager.{makeLeader,makeFollower}.
14. Included correlationId in LeaderAndIsrRequest.toString
15. Changed PartitionStateChangeLogger to StateChangeLogger everywhere.
16. Fixed the mentioned typos.
17. Added the state change log entries that you suggested.
18. As discussed offline, kept the wrapper class around Utils.Logging. 
Providing a logIdent to this class will save us the trouble to specify the 
broker id in every state change log entry, and it will keep the logging 
consistent with the regular server logging.
19. The successful lifecycle of a state change request will look like the 
following (error/discard/abort messages can be included in this sequence in 
case of failures) - 

On the controller - Controller %d, epoch %d sending become-leader/follower 
LeaderAndIsr request with correlationId %d to broker %d for partition [%s,%d]
On a broker - [Replica Manager on Broker %d]: Handling LeaderAndIsr request 
correlationId %d received from controller %d epoch %d for partition [%s,%d]
On the same broker - [Replica Manager on Broker %d]: LeaderAndIsr request 
correlationId %d received from controller %d epoch %d starting the 
become-leader/follower transition for partition [%s,%d]
On the same broker - [Replica Manager on Broker %d]: Completed 
become-leader/follower transition for partition [%s,%d]
On the same broker - [Replica Manager on Broker %d]: Handled LeaderAndIsr 
request correlationId %d received from controller %d epoch %d for partition 
[%s,%d]
On the controller - Controller %d received response correlationId %d for a 
request sent to broker %d

I think KafkaApis is not the right place to include a received leaderAndIsr 
request log entry. since KafkaApis handles all types of requests. We should 
rather expect to directly see a Handling LeaderAndIsr request log entry. On 
the broker, we don't know whether the received leaderAndIsr request is a 
become-leader or become-follower request, but I guess it's ok since we log that 
information in the second statement as seen above and while doing that we log 
correlationId.

B. I included correlationId in the abstract class RequestOrResponse, probably 
all its derived classes should include a correlationId.

C. config/log4j.properties now uses a separate controller.log and a separate 
state-change.log.

D. Our kafka-run-class.sh script removes quotes passed to the command line 
arguments. Changed it so that the quotes can be passed as such. It's useful for 
passing values containing whitespaces like 2013-03-01 16:03:43,093.

 Add state change log to Kafka brokers
 -

 Key: KAFKA-513
 URL: https://issues.apache.org/jira/browse/KAFKA-513
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 0.8
Reporter: Neha Narkhede
Assignee: Swapnil Ghike
Priority: Blocker
  Labels: p1, replication, tools
 Fix For: 0.8

 Attachments: kafka-513-v1.patch, kafka-513-v2.patch, 
 kafka-513-v3.patch, kafka-513-v4.patch

   Original Estimate: 96h
  Remaining Estimate: 96h

 Once KAFKA-499 is checked in, every controller to broker communication can be 
 modelled as a state change for one or more partitions. Every state change 
 request will carry the controller epoch. If there is a problem with the state 
 of some partitions, it will be good to have a tool that can create a timeline 
 of requested and completed state changes. This will require each broker to 
 output a state change log that has entries like
 [2012-09-10 10:06:17,280] broker 1 received request LeaderAndIsr() for 
 partition [foo, 0] from controller 2, epoch 1
 [2012-09-10 10:06:17,350] broker 1 completed request LeaderAndIsr() for 
 partition [foo, 0] from controller 2, epoch 1
 On controller, this will look like -
 [2012-09-10 10:06:17,198] controller 2, epoch 1, initiated 

[jira] [Created] (KAFKA-779) Standardize Zk data structures for Re-assign partitions and Preferred replication election

2013-03-01 Thread Swapnil Ghike (JIRA)
Swapnil Ghike created KAFKA-779:
---

 Summary: Standardize Zk data structures for Re-assign partitions 
and Preferred replication election
 Key: KAFKA-779
 URL: https://issues.apache.org/jira/browse/KAFKA-779
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Swapnil Ghike
Assignee: Swapnil Ghike
Priority: Blocker
 Fix For: 0.8


Follow the schema at 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Flume-NG Kafka Channel

2013-03-01 Thread Jonathan Creasy
I started hacking on a Kafka Channel for Flume-NG. It has a Java Consumer
and Java Producer based on the java example code that works with the PUT
and TAKE methods on the Channel Interface.

Any thoughts?

Kafka makes a really powerful addition to the existing channel options.

-Jonathan

-- 
**

*Jonathan Creasy* | Sr. Ops Engineer

e: j...@box.com | t: 314.580.8909