[ 
https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-573:
--------------------------

    Attachment: kafka-573.patch

Attach a patch. There are 2 problems. The first one is the most severe one. We 
recently changed FileMessageSet to remove the mutable flag. As a result, 
everytime a new FileMessageSet is created, the constructor sets the file 
channel's position to the end of the file. What's happening is that while a 
file channel is being appended for newly produced data, the file position is 
moved by FileMessageSet created for fetch requests. Since they are not properly 
synchronized, occasionally, a message in the log is overwritten. The second 
issue is that in ByteBufferMessageSet.writeTo. We try to reset the buffer 
position after writing the data in the buffer to the channel. However, since 
there is no guarantee that the whole buffer will be written to the channel in a 
single write, resetting the buffer position could cause incorrect bytes being 
written to the channel.

The patch fixes both issues. The changes are: (1) Added a new flag in the 
constructor of FileMessageSet to control whether the channel position is set to 
the end of the file or not. (2) Changed ByteBufferMessageSet.writeTo so that we 
wait until the whole buffer is written to the channel before resetting the 
buffer position. (3) Added a few more logging that I found useful while 
investigating the issues.

The system test passes now.
                
> System Test : Leader Failure Log Segment Checksum Mismatched When 
> request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>             Fix For: 0.8
>
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573.patch, 
> kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B 
> system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to