Travis Bischel created KAFKA-15656:
--------------------------------------
Summary: Frequent INVALID_RECORD on Kafka 3.6
Key: KAFKA-15656
URL: https://issues.apache.org/jira/browse/KAFKA-15656
Project: Kafka
Issue Type: Bug
Components: producer
Affects Versions: 3.6.0
Reporter: Travis Bischel
Attachments: invalid_record.log
Using this docker-compose.yml:
{noformat}
version: "3.7"
services:
kafka:
image: bitnami/kafka:latest
network_mode: host
environment:
KAFKA_ENABLE_KRAFT: yes
KAFKA_CFG_PROCESS_ROLES: controller,broker
KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP:
CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: [email protected]:9093
# Set this to "PLAINTEXT://127.0.0.1:9092" if you want to run this
container on localhost via Docker
KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CFG_NODE_ID: 1
ALLOW_PLAINTEXT_LISTENER: yes
KAFKA_KRAFT_CLUSTER_ID: XkpGZQ27R3eTl3OdTm2LYA # 16 byte base64-encoded
UUID{noformat}
And running franz-go integration tests with KGO_TEST_RF=1, I consistently
receive INVALID_RECORD errors.
Looking at the container logs, I see these problematic log lines:
{noformat}
2023-10-19 23:33:47,942] ERROR [ReplicaManager broker=1] Error processing
append operation on partition
0cf2f3faaafd3f906ea848b684b04833ca162bcd19ecae2cab36767a54f248c7-0
(kafka.server.ReplicaManager)
org.apache.kafka.common.InvalidRecordException: Invalid negative header key
size -25
[2023-10-19 23:33:47,942] ERROR [ReplicaManager broker=1] Error processing
append operation on partition
0cf2f3faaafd3f906ea848b684b04833ca162bcd19ecae2cab36767a54f248c7-6
(kafka.server.ReplicaManager)
org.apache.kafka.common.InvalidRecordException: Reached end of input stream
before skipping all bytes. Remaining bytes:94
[2023-10-19 23:33:47,942] ERROR [ReplicaManager broker=1] Error processing
append operation on partition
0cf2f3faaafd3f906ea848b684b04833ca162bcd19ecae2cab36767a54f248c7-1
(kafka.server.ReplicaManager)
org.apache.kafka.common.InvalidRecordException: Found invalid number of record
headers -26
[2023-10-19 23:33:47,948] ERROR [ReplicaManager broker=1] Error processing
append operation on partition
0cf2f3faaafd3f906ea848b684b04833ca162bcd19ecae2cab36767a54f248c7-6
(kafka.server.ReplicaManager)
org.apache.kafka.common.InvalidRecordException: Found invalid number of record
headers -27
[2023-10-19 23:33:47,950] ERROR [ReplicaManager broker=1] Error processing
append operation on partition
0cf2f3faaafd3f906ea848b684b04833ca162bcd19ecae2cab36767a54f248c7-22
(kafka.server.ReplicaManager)
org.apache.kafka.common.InvalidRecordException: Invalid negative header key
size -25
[2023-10-19 23:33:47,947] ERROR [ReplicaManager broker=1] Error processing
append operation on partition
c63b6e30987317fad18815effb8d432b6df677d2ab56cf6da517bb93fa49b74b-25
(kafka.server.ReplicaManager)
org.apache.kafka.common.InvalidRecordException: Found invalid number of record
headers -50
[2023-10-19 23:33:47,959] ERROR [ReplicaManager broker=1] Error processing
append operation on partition
c63b6e30987317fad18815effb8d432b6df677d2ab56cf6da517bb93fa49b74b-25
(kafka.server.ReplicaManager)
{noformat}
I modified franz-go with a diff to print the request that was written to the
wire once this error occurs. Attached is a v9 produce request. I deserialized
it locally and am not seeing the corrupt data that Kafka is printing. It's
possible there is a bug in the client, but again, these tests have never
received this error pre-Kafka 3.6. It _looks like_ there is either corruption
when processing the incoming data, or there is some problematic race condition
in the broker - I'm not sure which.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)