[ 
https://issues.apache.org/jira/browse/CAMEL-19894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

akrivda updated CAMEL-19894:
----------------------------
    Description: 
{*}Reproducing{*}:
 * Configure camel kafka consumer with with "breakOnFirstError" = "true"
 * Setup a topic with exactly 2 partitions
 * Produce a series of records to kafka record to both partitions.
 * Ensure offset is commited (I've done that with manual commit, autocommit 
*MAY* have a second bug also, check the description)
 * Make a route to consume this topic. Ensure the first poll gets records from 
both partitions. Ensure the second-to-consume partition has some more records 
to fetch in the next poll.
 * Trigger an error when processing exactly first record of the 
second-to-consume partition

*Expected behavior:*
 * Application should consume all records from the first partition, and none 
from the second. 

*Actual behavior:*
 * Application should consume all records from the first partition. Some 
records from the second partition are skipped (the number depends on quantity 
consumed from the first in a single poll).  

 

This bug was introduced in https://issues.apache.org/jira/browse/CAMEL-18350, 
which had fixed a major issue with breakOnFirstError, but had some edge cases.

The root cause is that lastResult variable is not cleaned between polls (and 
between partitions loop iterations), and might have an invalid dirty value got 
from the previous iteration. And it has no chance to be correctly initialized 
if exception happens on the first record of partition. Then forced sync commit 
is done to the right (new) partition but with invalid "random" (dirty) offset.

I've adjusted a project test project for CAMEL-18350 (many thanks to 
[~klease78]) to demonstrate the issue and published it to github. Check the 
failing test in the project: [https://github.com/Krivda/camel-bug-reproduction]

P.S. Also, there *might* be a second bug related to this issue which *may* 
occur with enableAutoCommit=true : when the bug occurs, physical commit *might* 
be not made to already processed partitions, which may result in double 
processing. But i haven't investigated this issue further. 

P.P.S - Please note, that the github project contains a very detailed 
description of the behavior pointing to the specific failing lines of code, 
that should be very helpful in investigation.

 

  was:
{*}Reproducing{*}:
 * Configure camel kafka consumer with with "breakOnFirstError" = "true"
 * Setup a topic with exactly 2 partitions
 * Produce a series of records to kafka record to both partitions.
 * Ensure offset is commited (I've done that with manual commit, autocommit 
*MAY* have a second bug also, check the description)
 * Make a route to consume this topic. Ensure the first poll gets records from 
both partitions. Ensure the second-to-consume has some more records to fetch in 
the next poll.
 * Trigger an error when processing exactly first record of the 
second-to-consume partition

*Expected behavior:*
 * Application should consume all records from the first partition, and none 
from the second. 

*Actual behavior:*
 * Application should consume all records from the first partition. Some 
records from the second partition are skipped (the number depends on quantity 
consumed from the first in a single poll).  

 

This bug was introduced in https://issues.apache.org/jira/browse/CAMEL-18350, 
which had fixed a major issue with breakOnFirstError, but had some edge cases.

The root cause is that lastResult variable is not cleaned between polls (and 
between partitions loop iterations), and might have an invalid dirty value got 
from the previous iteration. And it has no chance to be correctly initialized 
if exception happens on the first record of partition. Then forced sync commit 
is done to the right (new) partition but with invalid "random" (dirty) offset.

I've adjusted a project test project for CAMEL-18350 (many thanks to 
[~klease78]) to demonstrate the issue and published it to github. Check the 
failing test in the project: [https://github.com/Krivda/camel-bug-reproduction]

P.S. Also, there *might* be a second bug related to this issue which *may* 
occur with enableAutoCommit=true : when the bug occurs, physical commit *might* 
be not made to already processed partitions, which may result in double 
processing. But i haven't investigated this issue further. 

P.P.S - Please note, that the github project contains a very detailed 
description of the behavior pointing to the specific failing lines of code, 
that should be very helpful in investigation.

 


> camel-kafka: enabling "breakOnFirstError" causes to skip records on exception
> -----------------------------------------------------------------------------
>
>                 Key: CAMEL-19894
>                 URL: https://issues.apache.org/jira/browse/CAMEL-19894
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-kafka
>    Affects Versions: 3.21.0, 4.0.0
>            Reporter: akrivda
>            Priority: Minor
>
> {*}Reproducing{*}:
>  * Configure camel kafka consumer with with "breakOnFirstError" = "true"
>  * Setup a topic with exactly 2 partitions
>  * Produce a series of records to kafka record to both partitions.
>  * Ensure offset is commited (I've done that with manual commit, autocommit 
> *MAY* have a second bug also, check the description)
>  * Make a route to consume this topic. Ensure the first poll gets records 
> from both partitions. Ensure the second-to-consume partition has some more 
> records to fetch in the next poll.
>  * Trigger an error when processing exactly first record of the 
> second-to-consume partition
> *Expected behavior:*
>  * Application should consume all records from the first partition, and none 
> from the second. 
> *Actual behavior:*
>  * Application should consume all records from the first partition. Some 
> records from the second partition are skipped (the number depends on quantity 
> consumed from the first in a single poll).  
>  
> This bug was introduced in https://issues.apache.org/jira/browse/CAMEL-18350, 
> which had fixed a major issue with breakOnFirstError, but had some edge cases.
> The root cause is that lastResult variable is not cleaned between polls (and 
> between partitions loop iterations), and might have an invalid dirty value 
> got from the previous iteration. And it has no chance to be correctly 
> initialized if exception happens on the first record of partition. Then 
> forced sync commit is done to the right (new) partition but with invalid 
> "random" (dirty) offset.
> I've adjusted a project test project for CAMEL-18350 (many thanks to 
> [~klease78]) to demonstrate the issue and published it to github. Check the 
> failing test in the project: 
> [https://github.com/Krivda/camel-bug-reproduction]
> P.S. Also, there *might* be a second bug related to this issue which *may* 
> occur with enableAutoCommit=true : when the bug occurs, physical commit 
> *might* be not made to already processed partitions, which may result in 
> double processing. But i haven't investigated this issue further. 
> P.P.S - Please note, that the github project contains a very detailed 
> description of the behavior pointing to the specific failing lines of code, 
> that should be very helpful in investigation.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to