[
https://issues.apache.org/jira/browse/CAMEL-19894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
akrivda updated CAMEL-19894:
----------------------------
Description:
{*}Reproducing{*}:
* Configure camel kafka consumer with with "breakOnFirstError" = "true"
* Setup a topic with exactly 2 partitions
* Produce a series of records to kafka record to both partitions.
* Ensure offset is commited (I've done that with manual commit, autocommit
*MAY* have a second bug also, check the description)
* Make a route to consume this topic. Ensure the first poll gets records from
both partitions. Ensure the second-to-consume partition has some more records
to fetch in the next poll.
* Trigger an error when processing exactly first record of the
second-to-consume partition
*Expected behavior:*
* Application should consume all records from the first partition, and none
from the second.
*Actual behavior:*
* Application should consume all records from the first partition. Some
records from the second partition are skipped (the number depends on quantity
consumed from the first in a single poll).
This bug was introduced in https://issues.apache.org/jira/browse/CAMEL-18350,
which had fixed a major issue with breakOnFirstError, but had some edge cases.
The root cause is that lastResult variable is not cleaned between polls (and
between partitions loop iterations), and might have an invalid dirty value got
from the previous iteration. And it has no chance to be correctly initialized
if exception happens on the first record of partition. Then forced sync commit
is done to the right (new) partition but with invalid "random" (dirty) offset.
I've adjusted a project test project for CAMEL-18350 (many thanks to
[~klease78]) to demonstrate the issue and published it to github. Check the
failing test in the project: [https://github.com/Krivda/camel-bug-reproduction]
P.S. Also, there *might* be a second bug related to this issue which *may*
occur with enableAutoCommit=true : when the bug occurs, physical commit *might*
be not made to already processed partitions, which may result in double
processing. But i haven't investigated this issue further.
P.P.S - Please note, that the github project contains a very detailed
description of the behavior pointing to the specific failing lines of code,
that should be very helpful in investigation.
was:
{*}Reproducing{*}:
* Configure camel kafka consumer with with "breakOnFirstError" = "true"
* Setup a topic with exactly 2 partitions
* Produce a series of records to kafka record to both partitions.
* Ensure offset is commited (I've done that with manual commit, autocommit
*MAY* have a second bug also, check the description)
* Make a route to consume this topic. Ensure the first poll gets records from
both partitions. Ensure the second-to-consume has some more records to fetch in
the next poll.
* Trigger an error when processing exactly first record of the
second-to-consume partition
*Expected behavior:*
* Application should consume all records from the first partition, and none
from the second.
*Actual behavior:*
* Application should consume all records from the first partition. Some
records from the second partition are skipped (the number depends on quantity
consumed from the first in a single poll).
This bug was introduced in https://issues.apache.org/jira/browse/CAMEL-18350,
which had fixed a major issue with breakOnFirstError, but had some edge cases.
The root cause is that lastResult variable is not cleaned between polls (and
between partitions loop iterations), and might have an invalid dirty value got
from the previous iteration. And it has no chance to be correctly initialized
if exception happens on the first record of partition. Then forced sync commit
is done to the right (new) partition but with invalid "random" (dirty) offset.
I've adjusted a project test project for CAMEL-18350 (many thanks to
[~klease78]) to demonstrate the issue and published it to github. Check the
failing test in the project: [https://github.com/Krivda/camel-bug-reproduction]
P.S. Also, there *might* be a second bug related to this issue which *may*
occur with enableAutoCommit=true : when the bug occurs, physical commit *might*
be not made to already processed partitions, which may result in double
processing. But i haven't investigated this issue further.
P.P.S - Please note, that the github project contains a very detailed
description of the behavior pointing to the specific failing lines of code,
that should be very helpful in investigation.
> camel-kafka: enabling "breakOnFirstError" causes to skip records on exception
> -----------------------------------------------------------------------------
>
> Key: CAMEL-19894
> URL: https://issues.apache.org/jira/browse/CAMEL-19894
> Project: Camel
> Issue Type: Bug
> Components: camel-kafka
> Affects Versions: 3.21.0, 4.0.0
> Reporter: akrivda
> Priority: Minor
>
> {*}Reproducing{*}:
> * Configure camel kafka consumer with with "breakOnFirstError" = "true"
> * Setup a topic with exactly 2 partitions
> * Produce a series of records to kafka record to both partitions.
> * Ensure offset is commited (I've done that with manual commit, autocommit
> *MAY* have a second bug also, check the description)
> * Make a route to consume this topic. Ensure the first poll gets records
> from both partitions. Ensure the second-to-consume partition has some more
> records to fetch in the next poll.
> * Trigger an error when processing exactly first record of the
> second-to-consume partition
> *Expected behavior:*
> * Application should consume all records from the first partition, and none
> from the second.
> *Actual behavior:*
> * Application should consume all records from the first partition. Some
> records from the second partition are skipped (the number depends on quantity
> consumed from the first in a single poll).
>
> This bug was introduced in https://issues.apache.org/jira/browse/CAMEL-18350,
> which had fixed a major issue with breakOnFirstError, but had some edge cases.
> The root cause is that lastResult variable is not cleaned between polls (and
> between partitions loop iterations), and might have an invalid dirty value
> got from the previous iteration. And it has no chance to be correctly
> initialized if exception happens on the first record of partition. Then
> forced sync commit is done to the right (new) partition but with invalid
> "random" (dirty) offset.
> I've adjusted a project test project for CAMEL-18350 (many thanks to
> [~klease78]) to demonstrate the issue and published it to github. Check the
> failing test in the project:
> [https://github.com/Krivda/camel-bug-reproduction]
> P.S. Also, there *might* be a second bug related to this issue which *may*
> occur with enableAutoCommit=true : when the bug occurs, physical commit
> *might* be not made to already processed partitions, which may result in
> double processing. But i haven't investigated this issue further.
> P.P.S - Please note, that the github project contains a very detailed
> description of the behavior pointing to the specific failing lines of code,
> that should be very helpful in investigation.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)