[ 
https://issues.apache.org/jira/browse/KAFKA-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenLin updated KAFKA-8722:
---------------------------
    Description: 
In our production environment, when we consume kafka's topic data in an 
operating program, we found an error:

org.apache.kafka.common.KafkaException: Record for partition 
rl_dqn_debug_example-49 at offset 2911287689 is invalid, cause: Record is 
corrupt (stored crc = 3580880396, computed crc = 1701403171)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:869)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:788)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:480)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1188)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1046)
 at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:88)
 at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
 at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
 at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
 at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)

At this point we used the kafka.tools.DumpLogSegments tool to parse the disk 
log file and found that there was indeed dirty data:

!image-2019-07-27-14-57-06-687.png!

By looking at the code, I found that in some cases kafka would not verify the 
data and write it to disk, so we fixed it.
 We found that when record.offset is not equal to the offset we are expecting, 
kafka will set the variable inPlaceAssignment to false. When inPlaceAssignment 
is false, data will not be verified:

!image-2019-07-27-14-50-58-300.png!

!image-2019-07-27-14-50-08-128.png!

Our repairs are as follows:

!image-2019-07-27-15-18-22-716.png!

We did a comparative test for this. By modifying the client-side producer code, 
we made some dirty data. For the original kafka version, it was able to write 
to the disk normally, but when it was consumed, it was reported, but our 
repaired version was written. At the time, it can be verified, so this producer 
write failed:

!image-2019-07-27-15-05-12-565.png!

At this time, when the client consumes, an error will be reported:

!image-2019-07-27-15-06-07-123.png!

When the kafka server is replaced with the repaired version, the producer will 
verify that the dirty data is written. The producer failed to write the data 
this time

!image-2019-07-27-15-10-21-709.png!

  was:
In our production environment, when we consume kafka's topic data in an 
operating program, we found an error:

org.apache.kafka.common.KafkaException: Record for partition 
rl_dqn_debug_example-49 at offset 2911287689 is invalid, cause: Record is 
corrupt (stored crc = 3580880396, computed crc = 1701403171)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:869)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:788)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:480)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1188)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1046)
 at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:88)
 at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
 at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
 at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
 at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)

At this point we used the kafka.tools.DumpLogSegments tool to parse the disk 
log file and found that there was indeed dirty data:

!image-2019-07-27-14-57-06-687.png!

By looking at the code, I found that in some cases kafka would not verify the 
data and write it to disk, so we fixed it.
 We found that when record.offset is not equal to the offset we are expecting, 
kafka will set the variable inPlaceAssignment to false. When inPlaceAssignment 
is false, data will not be verified:

!image-2019-07-27-14-50-58-300.png!

!image-2019-07-27-14-50-08-128.png!

Our repairs are as follows:

!image-2019-07-27-15-18-22-716.png!

We did a comparative test for this. By modifying the client-side producer code, 
we made some dirty data. For the original kafka version, it was able to write 
to the disk normally, but when it was consumed, it was reported, but our 
repaired version was written. At the time, it can be verified, so this producer 
write failed:

!image-2019-07-27-15-05-12-565.png!

此时客户端消费的时候,会报错:

!image-2019-07-27-15-06-07-123.png!

When the kafka server is replaced with the repaired version, the producer will 
verify that the dirty data is written. The producer failed to write the data 
this time

!image-2019-07-27-15-10-21-709.png!


> In some cases, the crc check does not cause dirty data to be written.
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-8722
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8722
>             Project: Kafka
>          Issue Type: Improvement
>          Components: log
>    Affects Versions: 0.10.2.2
>            Reporter: ChenLin
>            Priority: Major
>             Fix For: 0.10.2.2
>
>         Attachments: Crc_data_verification_repair.patch, 
> image-2019-07-27-14-50-08-128.png, image-2019-07-27-14-50-58-300.png, 
> image-2019-07-27-14-56-25-610.png, image-2019-07-27-14-57-06-687.png, 
> image-2019-07-27-15-00-14-673.png, image-2019-07-27-15-05-12-565.png, 
> image-2019-07-27-15-06-07-123.png, image-2019-07-27-15-10-21-709.png, 
> image-2019-07-27-15-18-22-716.png
>
>
> In our production environment, when we consume kafka's topic data in an 
> operating program, we found an error:
> org.apache.kafka.common.KafkaException: Record for partition 
> rl_dqn_debug_example-49 at offset 2911287689 is invalid, cause: Record is 
> corrupt (stored crc = 3580880396, computed crc = 1701403171)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:869)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:788)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:480)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1188)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1046)
>  at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:88)
>  at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
>  at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
>  at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
>  at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> At this point we used the kafka.tools.DumpLogSegments tool to parse the disk 
> log file and found that there was indeed dirty data:
> !image-2019-07-27-14-57-06-687.png!
> By looking at the code, I found that in some cases kafka would not verify the 
> data and write it to disk, so we fixed it.
>  We found that when record.offset is not equal to the offset we are 
> expecting, kafka will set the variable inPlaceAssignment to false. When 
> inPlaceAssignment is false, data will not be verified:
> !image-2019-07-27-14-50-58-300.png!
> !image-2019-07-27-14-50-08-128.png!
> Our repairs are as follows:
> !image-2019-07-27-15-18-22-716.png!
> We did a comparative test for this. By modifying the client-side producer 
> code, we made some dirty data. For the original kafka version, it was able to 
> write to the disk normally, but when it was consumed, it was reported, but 
> our repaired version was written. At the time, it can be verified, so this 
> producer write failed:
> !image-2019-07-27-15-05-12-565.png!
> At this time, when the client consumes, an error will be reported:
> !image-2019-07-27-15-06-07-123.png!
> When the kafka server is replaced with the repaired version, the producer 
> will verify that the dirty data is written. The producer failed to write the 
> data this time
> !image-2019-07-27-15-10-21-709.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to