[
https://issues.apache.org/jira/browse/KAFKA-18019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Kim resolved KAFKA-18019.
------------------------------
Fix Version/s: 4.0.0
Resolution: Fixed
> Convert INVALID_PRODUCER_ID_MAPPING from abortable error to fatal error
> -----------------------------------------------------------------------
>
> Key: KAFKA-18019
> URL: https://issues.apache.org/jira/browse/KAFKA-18019
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Ritika Reddy
> Assignee: Ritika Reddy
> Priority: Major
> Fix For: 4.0.0
>
>
> Since we bump epoch on abort, we no longer need to call InitProducerId to
> fence requests. InitProducerId will only be called when the producer starts
> up to fence a previous instance.
> With this change, some other calls to InitProducerId were inspected including
> the call after receiving an InvalidPidMappingException. This exception was
> changed to abortable as part of [KIP-360: Improve reliability of
> idempotent/transactional
> producer|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820].
> However, this change means that we can violate EOS guarantees. As an example:
> Consider an application that is copying data from one partition to another
> * Application instance A processes to offset 4
> * Application instance B comes up and fences application instance A
> * Application instance B processes to offset 5
> * Application instances A and B are idle for transaction.id.expiration.ms,
> transaction id expires on server
> * Application instance A attempts to process offset 5 (since in its view,
> that is next) -- if we recover from invalid pid mapping, we can duplicate
> this processing
> Thus, INVALID_PID_MAPPING should be fatal to the producer.
> This is consistent with [KIP-1050: Consistent error handling for
> Transactions|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions]
> where errors that are fatal to the producer are in the "application
> recoverable" category. This is a grouping that indicates to the client that
> the producer needs to restart and recovery on the application side is
> necessary. KIP-1050 is approved so we are consistent with that decision.
> h3.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)