Hey Alieh, I see there are two options now. So folks will be discussing the approaches and deciding the best way forward before we vote? I do think there could be a problem with the approach on commit if we get stuck on an earlier error and have more records (potentially on new partitions) to commit as the current PR is implemented.
I guess this takes us back to the question of whether the error should be cleared on send. (And I guess at the back of my mind, I'm wondering if there is a way we can validate the "posion pill" records application side before we even try to send them) Justine On Wed, Jun 26, 2024 at 4:38 PM Alieh Saeedi <asae...@confluent.io.invalid> wrote: > Hi Justine, > > I did not update the KIP with `TxnSendOption` since I thought it'd be > better discussed here beforehand. > right now, there are 2 PRs: > - the PR that implements the current version of the KIP: > https://github.com/apache/kafka/pull/16332 > - the POC PR that clarifies the `TxnSendOption`: > https://github.com/apache/kafka/pull/16465 > > Bests, > Alieh > > On Thu, Jun 27, 2024 at 12:42 AM Justine Olshan > <jols...@confluent.io.invalid> wrote: > > > Hey Alieh, > > > > I think I am a little confused. Are the 3 points above addressed by the > KIP > > or did something change? The PR seems to not include this change and > still > > has the CommitOption as well. > > > > Thanks, > > Justine > > > > On Wed, Jun 26, 2024 at 2:15 PM Alieh Saeedi > <asae...@confluent.io.invalid > > > > > wrote: > > > > > Hi all, > > > > > > > > > Looking at the PR <https://github.com/apache/kafka/pull/16332> > > > corresponding to the KIP, there are some points worthy of mention: > > > > > > > > > 1) clearing the error ends up dirty/messy code in `TransactionManager`. > > > > > > 2) By clearing the error, we are actually doing an illegal transition > > from > > > `ABORTABLE_ERROR` to `IN_TRANSACTION` which is conceptually not > > acceptable. > > > This can be the root cause of some issues, with perhaps further future > > > changes by others. > > > > > > 3) If the poison pill record `r1` causes a transition to the error > state > > > and then the next record `r2` requires adding a partition to the > > > transaction, the action fails due to being in the error state. In this > > > case, clearing errors during `commitTnx(CLEAR_SEND_ERROR)` is too late. > > > However, this case can NOT be the main concern as soon as KIP-890 is > > fully > > > implemented. > > > > > > > > > My suggestion is to solve the problem where it arises. If the > transition > > to > > > the error state does not happen during `send()`, we do not need to > clear > > > the error later. Therefore, instead of `CommitOption`, we can define a > > > `TxnSendOption` and prevent the `send()` method from going to the error > > > state in case 1) we're in a transaction and 2) the user asked for > > > IGONRE_SEND_ERRORS. For more clarity, you can take a look at the POC PR > > > <https://github.com/apache/kafka/pull/16465>. > > > > > > Cheers, > > > Alieh > > > > > >