[ 
https://issues.apache.org/jira/browse/KAFKA-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Calvin Liu updated KAFKA-17877:
-------------------------------
    Description: 
{code:java}
java.lang.IllegalStateException: WriteTxnMarkerResponse for 
lkc-devcv9jg9n_transaction-bench-transaction-id-72UwIuNVQkOxl4y_OEBAlA does not 
contain expected error map for producer id 8308
{code}
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerRequestCompletionHandler.scala#L100]

------

It is a data partition side bug. The leader may return the response early 
without all the producer ID included in the response.

Consider the following case:
 # We have 2 markers to append, one for producer-0, one for producer-1
 # When we first process producer-0, it appends a marker to the 
__consumer_offset.
 # The __consumer_offset append finishes very fast because the group 
coordinator is no longer the leader. So the coordinator directly returns 
NOT_LEADER_OR_FOLLOWER. In its callback, it calls the {{maybeComplete()}} for 
the first time, and because there is only one partition to append, it is able 
to go further to call {{maybeSendResponseCallback()}} and decrement 
{{{}numAppends{}}}.
 # Then it calls the replica manager append for nothing, in the callback, it 
calls the {{maybeComplete()}} for the second time. This time, it also 
decrements {{{}numAppends{}}}.

Remember, because we only have 2 markers, the initial value for {{numAppends}} 
is also 2. So in step 4, it is able to finish the request without even 
processing producer-1. This will cause the producer-1 missing from the 
WriteTxnMarkers response.
----
As a result, the txn coordinator will not update the txn state correctly though 
the markers may have been written in the data partitions. There is an  impact 
on the clients. the 

  was:
{code:java}
java.lang.IllegalStateException: WriteTxnMarkerResponse for 
lkc-devcv9jg9n_transaction-bench-transaction-id-72UwIuNVQkOxl4y_OEBAlA does not 
contain expected error map for producer id 8308
{code}
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerRequestCompletionHandler.scala#L100]

------

It is a data partition side bug. The leader may return the response early 
without all the producer ID included in the response.

Consider the following case:
 # We have 2 markers to append, one for producer-0, one for producer-1
 # When we first process producer-0, it appends a marker to the 
__consumer_offset.
 # The __consumer_offset append finishes very fast because the group 
coordinator is no longer the leader. So the coordinator directly returns 
NOT_LEADER_OR_FOLLOWER. In its callback, it calls the {{maybeComplete()}} for 
the first time, and because there is only one partition to append, it is able 
to go further to call {{maybeSendResponseCallback()}} and decrement 
{{{}numAppends{}}}.
 # Then it calls the replica manager append for nothing, in the callback, it 
calls the {{maybeComplete()}} for the second time. This time, it also 
decrements {{{}numAppends{}}}.

Remember, because we only have 2 markers, the initial value for {{numAppends}} 
is also 2. So in step 4, it is able to finish the request without even 
processing producer-1. This will cause the producer-1 missing from the 
WriteTxnMarkers response.

-----

As a result, the txn coordinator will not update the txn state correctly though 
the markers may have been written in the data partitions. There is impact on 
the clients. the 


> IllegalStateException: missing producer id from the WriteTxnMarkersResponse
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-17877
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17877
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Calvin Liu
>            Assignee: Calvin Liu
>            Priority: Major
>
> {code:java}
> java.lang.IllegalStateException: WriteTxnMarkerResponse for 
> lkc-devcv9jg9n_transaction-bench-transaction-id-72UwIuNVQkOxl4y_OEBAlA does 
> not contain expected error map for producer id 8308
> {code}
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerRequestCompletionHandler.scala#L100]
> ------
> It is a data partition side bug. The leader may return the response early 
> without all the producer ID included in the response.
> Consider the following case:
>  # We have 2 markers to append, one for producer-0, one for producer-1
>  # When we first process producer-0, it appends a marker to the 
> __consumer_offset.
>  # The __consumer_offset append finishes very fast because the group 
> coordinator is no longer the leader. So the coordinator directly returns 
> NOT_LEADER_OR_FOLLOWER. In its callback, it calls the {{maybeComplete()}} for 
> the first time, and because there is only one partition to append, it is able 
> to go further to call {{maybeSendResponseCallback()}} and decrement 
> {{{}numAppends{}}}.
>  # Then it calls the replica manager append for nothing, in the callback, it 
> calls the {{maybeComplete()}} for the second time. This time, it also 
> decrements {{{}numAppends{}}}.
> Remember, because we only have 2 markers, the initial value for 
> {{numAppends}} is also 2. So in step 4, it is able to finish the request 
> without even processing producer-1. This will cause the producer-1 missing 
> from the WriteTxnMarkers response.
> ----
> As a result, the txn coordinator will not update the txn state correctly 
> though the markers may have been written in the data partitions. There is an  
> impact on the clients. the 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to