Trevan Richins created KAFKA-17229:
--------------------------------------

             Summary: Multiple punctuators that together exceed the transaction 
timeout cause ProducerFencedException
                 Key: KAFKA-17229
                 URL: https://issues.apache.org/jira/browse/KAFKA-17229
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 3.8.0
            Reporter: Trevan Richins
         Attachments: always-forward-failure.log, topic-input-failure.log

If a single StreamThread has multiple punctuators tasks and the sum total of 
them exceeds the transaction timeout setting, ProducerFencedExceptions will 
occur.

For example, in my test case, I have a input topic with 10 partitions, a 
processor with a punctuator that just sleeps for 5 seconds (the transaction 
timeout is 10s so it finishes within the timeout), and an output topic.  The 
punctuators run every 30 seconds (wall clock).  Once the app is running and is 
inside one of the punctuators, I put one record in the input topic.  The 
punctuators will all finish and the record will be seen and read but it won't 
commit because the punctuators run again (since it has been 30s since they last 
started).  After the punctuators finish this second time, it will try to commit 
the transaction that it started 50 seconds ago and will trigger the 
ProducerFencedException.

Another test case, with the same scenario, is having the punctuators forward 
something.  This also causes a ProducerFencedException because the first 
punctuator starts a transaction but it doesn't commit the transaction till all 
of the punctuators are done and that is long after the transaction timeout.

The issue doesn't exist if there is only one partition as the single punctuator 
will finish within the transaction timeout.  It is only whene there are 
multiple punctuators that exceed the transaction timeout in total.

It feels like what is needed is for kafka to check after each punctuator if 
there is data that needs to be committed.  If there is, it commits then.

 

I've attached a log of the first test case.  It is called 
"topic-input-failure.log".  It starts after the punctuators run the first time. 
 It shows the record being received and the transaction starting.  Then it runs 
the punctuators again and they each sleep for 5 seconds.  Once they are done, 
it triggers a ProducerFencedException.

I've attached a log for the second test case.  It is called 
"always-forward-failure.log".  It starts when the punctuators run the first 
time.  It shows the punctuators forwarding a record and sleeping for 5 seconds. 
 In this case, only 5 punctuators run as a group.  An 
InvalidProducerEpochException occurs after the 5th punctuator finishes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to