[ https://issues.apache.org/jira/browse/KAFKA-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Trevan Richins updated KAFKA-17229: ----------------------------------- Description: If a single StreamThread has multiple punctuators tasks and the sum total of them exceeds the transaction timeout setting, ProducerFencedExceptions will occur. For example, in my test case, I have a input topic with 10 partitions, a processor with a punctuator that just sleeps for 5 seconds (the transaction timeout is 10s so it finishes within the timeout), and an output topic. The punctuators run every 30 seconds (wall clock). Once the app is running and is inside one of the punctuators, I put one record in the input topic. The punctuators will all finish and the record will be seen and read but it won't commit because the punctuators run again (since it has been 30s since they last started). After the punctuators finish this second time, it will try to commit the transaction that it started 50 seconds ago and will trigger the ProducerFencedException. Another test case, with the same scenario, is having the punctuators forward something. This also causes a ProducerFencedException because the first punctuator starts a transaction but it doesn't commit the transaction till all of the punctuators are done and that is long after the transaction timeout. The issue doesn't exist if there is only one partition as the single punctuator will finish within the transaction timeout. It is only when there are multiple punctuators that exceed the transaction timeout in total. It feels like what is needed is for kafka to check after each punctuator if there is data that needs to be committed. If there is, it commits then. I've attached a log of the first test case. It is called "topic-input-failure.log". It starts after the punctuators run the first time. It shows the record being received and the transaction starting. Then it runs the punctuators again and they each sleep for 5 seconds. Once they are done, it triggers a ProducerFencedException. I've attached a log for the second test case. It is called "always-forward-failure.log". It starts when the punctuators run the first time. It shows the punctuators forwarding a record and sleeping for 5 seconds. In this case, only 5 punctuators run as a group. An InvalidProducerEpochException occurs after the 5th punctuator finishes. was: If a single StreamThread has multiple punctuators tasks and the sum total of them exceeds the transaction timeout setting, ProducerFencedExceptions will occur. For example, in my test case, I have a input topic with 10 partitions, a processor with a punctuator that just sleeps for 5 seconds (the transaction timeout is 10s so it finishes within the timeout), and an output topic. The punctuators run every 30 seconds (wall clock). Once the app is running and is inside one of the punctuators, I put one record in the input topic. The punctuators will all finish and the record will be seen and read but it won't commit because the punctuators run again (since it has been 30s since they last started). After the punctuators finish this second time, it will try to commit the transaction that it started 50 seconds ago and will trigger the ProducerFencedException. Another test case, with the same scenario, is having the punctuators forward something. This also causes a ProducerFencedException because the first punctuator starts a transaction but it doesn't commit the transaction till all of the punctuators are done and that is long after the transaction timeout. The issue doesn't exist if there is only one partition as the single punctuator will finish within the transaction timeout. It is only whene there are multiple punctuators that exceed the transaction timeout in total. It feels like what is needed is for kafka to check after each punctuator if there is data that needs to be committed. If there is, it commits then. I've attached a log of the first test case. It is called "topic-input-failure.log". It starts after the punctuators run the first time. It shows the record being received and the transaction starting. Then it runs the punctuators again and they each sleep for 5 seconds. Once they are done, it triggers a ProducerFencedException. I've attached a log for the second test case. It is called "always-forward-failure.log". It starts when the punctuators run the first time. It shows the punctuators forwarding a record and sleeping for 5 seconds. In this case, only 5 punctuators run as a group. An InvalidProducerEpochException occurs after the 5th punctuator finishes. > Multiple punctuators that together exceed the transaction timeout cause > ProducerFencedException > ----------------------------------------------------------------------------------------------- > > Key: KAFKA-17229 > URL: https://issues.apache.org/jira/browse/KAFKA-17229 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 3.8.0 > Reporter: Trevan Richins > Priority: Major > Attachments: always-forward-failure.log, topic-input-failure.log > > > If a single StreamThread has multiple punctuators tasks and the sum total of > them exceeds the transaction timeout setting, ProducerFencedExceptions will > occur. > For example, in my test case, I have a input topic with 10 partitions, a > processor with a punctuator that just sleeps for 5 seconds (the transaction > timeout is 10s so it finishes within the timeout), and an output topic. The > punctuators run every 30 seconds (wall clock). Once the app is running and > is inside one of the punctuators, I put one record in the input topic. The > punctuators will all finish and the record will be seen and read but it won't > commit because the punctuators run again (since it has been 30s since they > last started). After the punctuators finish this second time, it will try to > commit the transaction that it started 50 seconds ago and will trigger the > ProducerFencedException. > Another test case, with the same scenario, is having the punctuators forward > something. This also causes a ProducerFencedException because the first > punctuator starts a transaction but it doesn't commit the transaction till > all of the punctuators are done and that is long after the transaction > timeout. > The issue doesn't exist if there is only one partition as the single > punctuator will finish within the transaction timeout. It is only when there > are multiple punctuators that exceed the transaction timeout in total. > It feels like what is needed is for kafka to check after each punctuator if > there is data that needs to be committed. If there is, it commits then. > > I've attached a log of the first test case. It is called > "topic-input-failure.log". It starts after the punctuators run the first > time. It shows the record being received and the transaction starting. Then > it runs the punctuators again and they each sleep for 5 seconds. Once they > are done, it triggers a ProducerFencedException. > I've attached a log for the second test case. It is called > "always-forward-failure.log". It starts when the punctuators run the first > time. It shows the punctuators forwarding a record and sleeping for 5 > seconds. In this case, only 5 punctuators run as a group. An > InvalidProducerEpochException occurs after the 5th punctuator finishes. -- This message was sent by Atlassian Jira (v8.20.10#820010)