[ 
https://issues.apache.org/jira/browse/KAFKA-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ari Uka updated KAFKA-6679:
---------------------------
    Description: 
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.

I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]

  was:
I'm running into a really strange issue on production. I have 3 brokers and 
randomly consumers will start to fail with an error message saying the CRC does 
not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 with 
the hope that upgrading would help fix the issue.

On the kafka side, I see errors related to this across all 3 brokers:

```

[2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
fetcherId=0] Error for partition topic-a-0 to broker 
1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition topic-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14).

[2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
fetch operation on partition telemetry-b-0, offset 23848795 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
than minimum record overhead (14)

[2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
fetcherId=0] Error for partition topic-c-2 to broker 
2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
(kafka.server.ReplicaFetcherThread)

```

 

To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
do a binary search until I can find a non corrupt message and push the offsets 
forward. It's annoying because I can't actually push to a specific date because 
kafka-consumer-groups.sh starts to emit the same error, ErrInvalidMessage, CRC 
does not match.


I'm using the Go consumer [https://github.com/Shopify/sarama] and 
[https://github.com/bsm/sarama-cluster]


> Random corruption (CRC validation issues) 
> ------------------------------------------
>
>                 Key: KAFKA-6679
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6679
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer, replication
>    Affects Versions: 0.10.2.0, 1.0.1
>         Environment: FreeBSD 11.0-RELEASE-p8
>            Reporter: Ari Uka
>            Priority: Major
>
> I'm running into a really strange issue on production. I have 3 brokers and 
> randomly consumers will start to fail with an error message saying the CRC 
> does not match. The brokers are all on 1.0.1, but the issue started on 0.10.2 
> with the hope that upgrading would help fix the issue.
> On the kafka side, I see errors related to this across all 3 brokers:
> ```
> [2018-03-17 20:59:58,967] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Error for partition topic-a-0 to broker 
> 1:org.apache.kafka.common.errors.CorruptRecordException: This message has 
> failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
> (kafka.server.ReplicaFetcherThread)
> [2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
> fetch operation on partition topic-b-0, offset 23848795 
> (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
> than minimum record overhead (14).
> [2018-03-17 20:59:59,411] ERROR [ReplicaManager broker=3] Error processing 
> fetch operation on partition topic-b-0, offset 23848795 
> (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller 
> than minimum record overhead (14)
> [2018-03-17 20:59:59,490] ERROR [ReplicaFetcher replicaId=3, leaderId=2, 
> fetcherId=0] Error for partition topic-c-2 to broker 
> 2:org.apache.kafka.common.errors.CorruptRecordException: This message has 
> failed its CRC checksum, exceeds the valid size, or is otherwise corrupt. 
> (kafka.server.ReplicaFetcherThread)
> ```
>  
> To fix this, I have to use the kafka-consumer-groups.sh command line tool and 
> do a binary search until I can find a non corrupt message and push the 
> offsets forward. It's annoying because I can't actually push to a specific 
> date because kafka-consumer-groups.sh starts to emit the same error, 
> ErrInvalidMessage, CRC does not match.
> I'm using the Go consumer [https://github.com/Shopify/sarama] and 
> [https://github.com/bsm/sarama-cluster]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to