[
https://issues.apache.org/jira/browse/KAFKA-18019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ritika Reddy updated KAFKA-18019:
---------------------------------
Description:
Since we bump epoch on abort, we no longer need to call InitProducerId to fence
requests. InitProducerId will only be called when the producer starts up to
fence a previous instance.
With this change, some other calls to InitProducerId were inspected including
the call after receiving an InvalidPidMappingException. This exception was
changed to abortable as part of [KIP-360: Improve reliability of
idempotent/transactional
producer|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820].
However, this change means that we can violate EOS guarantees. As an example:
Consider an application that is copying data from one partition to another
* Application instance A processes to offset 4
* Application instance B comes up and fences application instance A
* Application instance B processes to offset 5
* Application instances A and B are idle for transaction.id.expiration.ms,
transaction id expires on server
* Application instance A attempts to process offset 5 (since in its view, that
is next) -- if we recover from invalid pid mapping, we can duplicate this
processing
Thus, INVALID_PID_MAPPING should be fatal to the producer.
This is consistent with [KIP-1050: Consistent error handling for
Transactions|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions]
where errors that are fatal to the producer are in the "application
recoverable" category. This is a grouping that indicates to the client that the
producer needs to restart and recovery on the application side is necessary.
KIP-1050 is approved so we are consistent with that decision.
h3.
> Convert INVALID_PRODUCER_ID_MAPPING from abortable error to fatal error
> -----------------------------------------------------------------------
>
> Key: KAFKA-18019
> URL: https://issues.apache.org/jira/browse/KAFKA-18019
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Ritika Reddy
> Assignee: Ritika Reddy
> Priority: Major
>
> Since we bump epoch on abort, we no longer need to call InitProducerId to
> fence requests. InitProducerId will only be called when the producer starts
> up to fence a previous instance.
> With this change, some other calls to InitProducerId were inspected including
> the call after receiving an InvalidPidMappingException. This exception was
> changed to abortable as part of [KIP-360: Improve reliability of
> idempotent/transactional
> producer|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820].
> However, this change means that we can violate EOS guarantees. As an example:
> Consider an application that is copying data from one partition to another
> * Application instance A processes to offset 4
> * Application instance B comes up and fences application instance A
> * Application instance B processes to offset 5
> * Application instances A and B are idle for transaction.id.expiration.ms,
> transaction id expires on server
> * Application instance A attempts to process offset 5 (since in its view,
> that is next) -- if we recover from invalid pid mapping, we can duplicate
> this processing
> Thus, INVALID_PID_MAPPING should be fatal to the producer.
> This is consistent with [KIP-1050: Consistent error handling for
> Transactions|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions]
> where errors that are fatal to the producer are in the "application
> recoverable" category. This is a grouping that indicates to the client that
> the producer needs to restart and recovery on the application side is
> necessary. KIP-1050 is approved so we are consistent with that decision.
> h3.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)