[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947188#comment-16947188 ] ASF GitHub Bot commented on KAFKA-7190: --- hachikuji commented on pull request #7388: KAFKA-7190: Retain producer state until transactionalIdExpiration time passes URL: https://github.com/apache/kafka/pull/7388 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: Guozhang Wang >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937858#comment-16937858 ] ASF GitHub Bot commented on KAFKA-7190: --- bob-barrett commented on pull request #7388: KAFKA-7190: Retain producer state until transactional id expires URL: https://github.com/apache/kafka/pull/7388 As described in KIP-360, this patch changes producer state retention so that prodcuer state remains cached even after it is removed from the log. Producer state will only be removed now when the trasnactional id expiration time has passed. This is intended to reduce the incidence of UNKNOWN_PRODUCER_ID errors for producers when records are deleted or when a topic has a short retention time. Tested with unit tests. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: Guozhang Wang >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892046#comment-16892046 ] Bob Barrett commented on KAFKA-7190: The periodic producer expiration check expires producers if 1) there is no ongoing transaction for the producer, and 2) the max timestamp of the last batch written by that producer is more than `transactional.id.expiration.ms` ago. The default value for `transactional.id.expiration.ms` is 7 days. If the input topic only has messages older than 7 days, and the transformed records produced to it also have a timestamp older than 7 days, then the producer will be expired the first time that the expiration check runs without an ongoing transaction. KIP-360 will indirectly address this problem by allowing the producer to continue after receiving an UNKNOWN_PRODUCER_ID error, but a better solution would probably be to set the producer state timestamp based on the current time, not the batch timestamp, as [~rocketraman] suggested. [~hachikuji] what do you think? In the meantime, another workaround would be to set `transactional.id.expiration.ms` to a larger number, which would allow the transformed records to retain the default Streams timestamp behavior. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: Guozhang Wang >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891992#comment-16891992 ] Raman Gupta commented on KAFKA-7190: > Did you override message.timestamp.difference.max.ms? No. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: Guozhang Wang >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891976#comment-16891976 ] Guozhang Wang commented on KAFKA-7190: -- Hmm interesting, in this case the topic is not a repartition topic so no purge-records requests would be sent (note that in this case even if the messages are not yet physically removed, the producer id would still be deleted). Did you override {{message.timestamp.difference.max.ms}}? cc [~bob-barrett] who's working on KIP-360 now, maybe he can chime in with some insights. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: Guozhang Wang >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891537#comment-16891537 ] Raman Gupta commented on KAFKA-7190: [~mjsax] [~guozhang] I want to point out that the behavior I saw above was when writing to a topic with compaction enabled, but infinite retention. In fact, the stream is reading and writing the same topic and, as noted, messages with the same timestamp, so there would be no reason for the broker to have retained the input message, yet deleted the output message. In other words, the produced messages were *not* being deleted, but yet the transaction ID was. This is another reason why the behavior was so surprising to me. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: Guozhang Wang >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891455#comment-16891455 ] Guozhang Wang commented on KAFKA-7190: -- [~rocketraman] just to clarify: * In general producer id would only be deleted from the broker if ALL records that this producer has ever produced on the topic-partition has been deleted due to log retention policy. * For Kafka Streams, as you observed by default it does not change timestamp when producing to sink topic, which means that "processing an event as of 7 days ago generate a result as of 7 days ago as well", this the the default reasonable behavior So if the destination topic is configured with 7 days retention policy only, the produced record would be deleted immediately, causing the above mentioned scenario, which should be resolved by KIP-360. But it is not wrong to delete the record immediately since the broker-side log retention is independent of Streams processing logic: say if you process a record from topic A configured with 7 day retention, and writing the result to another topic B with 1 day retention only, then very likely you would see the results been deleted immediately as well. This is purely Kafka's log retention definition and should not be violated by Streams. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: Guozhang Wang >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891211#comment-16891211 ] Matthias J. Sax commented on KAFKA-7190: Thanks for your comment [~rocketraman] – KIP-360 will address this ticket and will also solve the case you describe. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: Guozhang Wang >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888543#comment-16888543 ] Raman Gupta commented on KAFKA-7190: I want to mention another case in which this happened. I'm not sure it has been discussed above, but as the discussion was quite technical, perhaps I am wrong about that. If this makes sense to put into a separate issue, let me know. In any case, the situation is 1. a topic with messages that are over 7 days old. 2. a stream that transforms messages on that topic, and writes back different messages to the same topic (though I suspect that doesn't matter, it could be any topic). 3. Writes to the topic get `UnknownProducerIdException` The default for Streams is to write the transformed record with the same timestamp as the input record. The producer id being deleted seems to be based on the timestamp of that transformed record, which is more than 7 days old, even though the record was actually written *right now*. however, it seems very very wrong to delete a producer id that was just created, just because the producer with that id happened to produce a message with an old timestamp. Why not just track when the producer id was last used, and then garbage collect it based on that? The workaround in this case is to use a transformer to set the produced record timestamp. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: Guozhang Wang >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808246#comment-16808246 ] ASF GitHub Bot commented on KAFKA-7190: --- guozhangwang commented on pull request #6511: KAFKA-7190: KIP-443; Remove streams overrides on repartition topics URL: https://github.com/apache/kafka/pull/6511 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803424#comment-16803424 ] ASF GitHub Bot commented on KAFKA-7190: --- guozhangwang commented on pull request #6511: KAFKA-7190: KIP-443; Remove streams overrides on repartition topics URL: https://github.com/apache/kafka/pull/6511 As described in KIP-443 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-443%3A+Return+to+default+segment.ms+and+segment.index.bytes+in+Streams+repartition+topics). We want to remove the aggressive overrides of segment.ms and segment.index.bytes for repartition topics. The remaining segment.bytes should still be effective in bounding its footprint. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803180#comment-16803180 ] Guozhang Wang commented on KAFKA-7190: -- Will file a KIP / PR for this discussion, but this ticket itself should be resolved only after KIP-360 is done. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803153#comment-16803153 ] Guozhang Wang commented on KAFKA-7190: -- I agree. I think we shall at least increase `segment.ms` at the moment ant let it bound by `segment.bytes` only (with 50MB default value this should still be effective in bounding repartition topic sizes). As for `segment.index.bytes` looking at the original PR I think it should be fixed to `segment.bytes`. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773259#comment-16773259 ] Matthias J. Sax commented on KAFKA-7190: [~guozhang] It seems that a workaround would be to increase topic configs `segment.bytes`, `segment.index.bytes`, and `segment.ms` for the corresponding repartition topics. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589573#comment-16589573 ] lambdaliu commented on KAFKA-7190: -- Hi [~guozhang] As far as I am concerned , the follower also maintains PID cache. The case that Joson described will happen when partition reassignment with leader migrate. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588383#comment-16588383 ] Matthias J. Sax commented on KAFKA-7190: [~guozhang] From my understanding, if UNKNOWN_PRODUCER_ID happens, the KIP proposes that the producer will not automatically bump the epoch but throw an exception, right? It's up to the user to abort the current transaction and retry. For idempotent producer, it's the users choice how to proceed – a resend might introduce duplicates for this case and thus, the producer will not automatically resend but throw, to let the user decide what to do next. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588370#comment-16588370 ] Guozhang Wang commented on KAFKA-7190: -- [~lambdaliu] As Jason's proposal #5 mentioned, we do plan to keep PID in the cache until it was expired, instead of immediately delete when the last record is deleted. However, even in this case, when there is a leader migration there are still inconsistent PID caching, because only leader maintains PID cache. So think about this sequence: 1. last record of producer deleted on leader. 2. deletion migrated to follower, which delete the record as well. 3. leader migration happens, the follower becomes the new leader and builds the PID cache from logs, which do not have the producer ids. 4. producer sends to the new leader, which does not recognize it any more. Hence, the UNKNOWN_PRODUCER_ID can still be sent back. The rationale is that since it is quite rare, using the safer way to bump up the epoch (which is more costly than resetting sequence number) is fine. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586903#comment-16586903 ] lambdaliu commented on KAFKA-7190: -- Hi [~mjsax], Changing the log level to DEBUG can't avoid the UNKNOWN_PRODUCER error which can cause one more Request-Response to reset producer epoch and sequence. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586886#comment-16586886 ] lambdaliu commented on KAFKA-7190: -- Hi [~hachikuji], sorry for the later reply. I have thought about the solution you suggested last time, and found it's not easy to reset sequence to 0 for each transaction. Because the broker may receive a new transactional produce request before the last complete transaction's EndTxnMaker request. So we would better cache all PID in memory until it expired. For the idea of soft delete records which beyond the LSO, it's easy to implement. But the deletion of retention time and retention size breach may still delete the segment which contain active transaction. For this case, maybe we can use snapshot file to save the active PID. With this change we can always recover PID which have active transaction from log. For the KIP you posted seems great and I am glad to work on it. Thanks. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586359#comment-16586359 ] Jason Gustafson commented on KAFKA-7190: Posted a KIP here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-360%3A+Improve+handling+of+unknown+producer. Please take a look. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584462#comment-16584462 ] Jason Gustafson commented on KAFKA-7190: Sorry, this will be lengthy comment because this is a surprisingly complex issue. I've been giving this some more thought since this problem keeps recurring. In fact, the way the producer works currently is close to what I suggested above. When the producer receives an UNKNOWN_PRODUCER error, it attempts to reset the sequence number if it believes it is safe to do so. This is actually problematic for the reasons I mentioned above. Whenever we reuse a sequence number, we are violating our uniqueness assumption which means some guarantees go out the door (at least theoretically). In other words, I think the current workaround was just a bad idea. The approach suggested here is actually safer for the common case. However, the main drawback is that we lose the consistency between the cached producer state and the state in the log. In the worst case, if we have to rebuild producer state using the log, then we will lose some of the producers, which puts us back in the position of handling UNKNOWN_PRODUCER errors in the clients. For example, this would happen upon partition reassignment since the new replica will reload producer state using the log. This can cause surprising behavior in some cases. For example, a producer which is fenced by the cached state in one leader may become unfenced after changing to a reassigned leader which rebuilt using only the log. Alternatively, a valid sequence number of the leader may become invalid after a leader failover. I think the basic flaw here is that we allow the monotonicity of producer writes to be violated in two cases. In the first case, we violate it when we reset the sequence number after receiving an UNKNOWN_PRODUCER error. In the second case, we violate it because our fencing cannot protect us when we don't have producer state. Understanding the problem at least suggests possible solutions. Here is what I'm thinking: 1. We need to add fencing for the first transactional write from a producer. Basically I think we need a new inter-broker API, say CheckProducerFenced, which can verify whether an epoch is correct when there is no local state that can be relied upon. 2. When we encounter an UNKNOWN_PRODUCER error in the client, we need a safe way to bump the epoch in order to continue. We can update the InitProducerId API to include an optional epoch. When the transaction coordinator receives the request, it can verify that the epoch matches the current epoch before incrementing it. That way the producer will not mistakenly fence another producer. 3. If we receive UNKNOWN_PRODUCER and we are in a transaction, we should probably just abort. After aborting, we can bump the epoch and safely continue. 4. For the idempotent producer, if we get UNKNOWN_PRODUCER, it should be safe to bump the epoch locally because the producer id is guaranteed to be unique. 5. Once we have these fixes, the submitted patch becomes more attractive. We can keep producers in the cache longer than their state exists in the log. We may still get an unexpected UNKNOWN_PRODUCER error due to the possible inconsistency between the leader and follower (e.g. as a result of reassignment), but it should be a rare case and we can always abort the transaction, bump the epoch, and continue. In any case, our guarantees will not be violated. This is a pretty high level, but I'm happy to flesh out the details in a KIP. However, I probably don't have the time to implement it. [~lambdaliu] If you think this is a good idea, perhaps you'd be willing to help out? > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582931#comment-16582931 ] Matthias J. Sax commented on KAFKA-7190: I want to through out another thought: Given the current implementation and API (ignoring in-flight transactions for a moment), a truncated topic may results in lost PIDs and thus in WARNING logs client side – however, because the producer know how to handle this situation I am wondering if logging this at WARN level is the right decision and if DEBUG logging might be more appropriate. This would address the surface issue of the ticket. WDYT? For truncation itself, we could say truncating in-flight transactions is a user error or not allowed and it's the users fault it the producer crashes (not sure, if this might be an issue with regard to retention? – I guess not in practice). Delaying a delete-record-request for an in-flight transaction might be a good other solution. What I am getting at is, that caching of PIDs in memory might actually not be required? Thoughts? > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572763#comment-16572763 ] Jason Gustafson commented on KAFKA-7190: This is a tough one. To guarantee transaction semantics, we need to retain producer state in the log. Without that state, our only options are to raise an error or weaken semantics. The problem with deleting beyond the LSO is that we may lose the producer state of an active transaction. As I understand it, the proposal here is to retain the state in memory even though we have lost it in the log, but in the worst case, we would still end up raising the UNKNOWN_PRODUCER error. The log is ultimately the source of truth for producer state. Doesn't it seem odd that a call to DeleteRecords can effectively kill a producer with an active transaction? What I'm wondering is whether deletion can be "soft" in the case that the offset is higher than the LSO. We can advance the log start offset to the new offset, but we can retain the data in the log until the LSO has reached the new log start offset. Then we could guarantee that the producer state of an active transaction is never lost. This is useful because if a transactional produce request arrives and we have no producer state, then we know that it is either the start of a new transaction and safe to allow or it is a stale write from a fenced producer. The holy grail is being able to distinguish between these two cases. One option I was thinking about is letting each transaction start at sequence number 0. This would allow us to distinguish these two cases for all but the first record in a transaction. Leaving the one loose end is not satisfying, but technically it was already loose before. It is possible today for a producer to start a transaction and then become a zombie. If its transaction gets aborted by the coordinator and the state is lost due to a call to DeleteRecords, then the zombie can still wakeup and write to the partition. I'm not too sure how we'll fix this, but the point is we have to fix it anyway. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1658#comment-1658 ] ASF GitHub Bot commented on KAFKA-7190: --- lambdaliu opened a new pull request #5448: KAFKA-7190: Retain producerIds when truncate log head to avoid UNKNOWN_PRODUCER_ID URL: https://github.com/apache/kafka/pull/5448 When a streams application has little traffic, then it is possible that consumer purging would delete even the last message sent by a producer (i.e., all the messages sent by this producer have been consumed and committed), and as a result, the broker would delete that producer's ID. The next time when this producer tries to send, it will get this UNKNOWN_PRODUCER_ID error code. This PR fix the above problem by delay the deletion of producer ID until it expires. - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564694#comment-16564694 ] lambdaliu commented on KAFKA-7190: -- Hi [~hachikuji], What do you think about the solution of the problem? Looking forward to your opinion. Thanks! > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560409#comment-16560409 ] Guozhang Wang commented on KAFKA-7190: -- What you described looks reasonable to me. I'd like to have [~hachikuji] also chime in here since he's originally implemented this logic and can shed some light as well. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558265#comment-16558265 ] lambdaliu commented on KAFKA-7190: -- Hello [~guozhang]. My team developed a cloud version Kafka and I am familiar with the broker. So I think probably I can solve this issue. When we remove the head of the log, we take the bellowing steps in ProducerStateManager.truncateHead : 1. clean producerId whose last offset smaller than log start offset 2. remove procducerId's BatchMetadata which have a last offset smaller than log start offset 3. remove ongoing transaction whose producerId remove in step 1. 4. remove unreplicated transaction whose last offset smaller than log start offset 5. update lastMapOffset to log start offset if lasterMapoffset is smaller than log start offset 6. delete snapshot file older than the new log start offset As you suggested, we can delay the deletion of producer ID until it expired. We can also delay the step 2 and step 3 to that time. For the old snapshot file in step 6, we can rely on the period called function deleteSnapshotsAfterRecoveryPointCheckpoint to delete it. And when loading producer state from snapshot file we may not drop the producerId whose last offset smaller than log start offset. So we just need do step 4 and step 5 when remove the head of the log. For the additional PID expiration config, is there any reason to add it? if it is reasonable, I will add it. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556289#comment-16556289 ] Dong Lin commented on KAFKA-7190: - Thanks you [~mjsax] [~guozhang] for the discussion. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556238#comment-16556238 ] Guozhang Wang commented on KAFKA-7190: -- [~lindong] That's a good point. I think maybe we would not enforce users to not delete beyond LSO. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556070#comment-16556070 ] Matthias J. Sax commented on KAFKA-7190: Good point. You are right, consumers in `read_uncommitted` mode can read the data. So maybe we don't need to change anything and explain users that they use the API incorrectly if they delete uncommitted data but want to consume in `read_committed` mode. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556053#comment-16556053 ] Dong Lin commented on KAFKA-7190: - [~mjsax] I guess one question is, if LSO is defined for a partition, will consumer be able to consume beyond this offset, e.g. in read-uncommitted mode? If it is possible, then it seems that we should allow user to be able to delete messages beyond LSO. Since consumer may have already consumed it, it does not unnecessarily cause data loss if message beyond LSO is deleted. If it is impossible, then I agree we prevent user from deleting beyond LSO. What do you think? > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556042#comment-16556042 ] Matthias J. Sax commented on KAFKA-7190: > The main reason we need to prevent client from deleting beyond HW is because >otherwise follower may receive OutOfRangeException Isn't `OutOfRangeException` only a symptom? It seems that the actual issue is potential data loss if uncommitted data get's deleted... Ie, the main reason to not delete data is to avoid data loss? I understand your second point that the `OutOfRangeException` should never be triggered if the API is used correctly. However, I still think the change is justified and it would be an additional "safety net" – of course, we should document that uncommitted data cannot be deleted via purge data request. As an enhancement, we should also introduce a "force" option to purge data to allow deleting uncommitted data (not sure if we need this): for this case, we should document that it might result in `OutOfRangeException` for downstream consumers and should be used with care? [~guozhang] [~lambdaliu] For keeping PID. Why do we need a very short "delay time" until the PID can be purged? Why would 7 days be an issue? Might be better to avoid additional configs if we can. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556011#comment-16556011 ] Dong Lin commented on KAFKA-7190: - [~guozhang] Certainly, I am happy to make the modification after we agree on the solution. Here is my thought regarding whether we should allow client to delete beyond last stable offset. The main reason we need to prevent client from deleting beyond HW is because otherwise follower may receive OutOfRangeException and the broker logic may be messed up. Is there similar concern in the broker if we delete beyond LSO? If the goal of not deleting beyond LSO is to make sure that messages can be exposed to consumers before being deleted, then I am not sure if this change is justified. My understanding is that we currently rely on user to make sure that messages have been consumed by all consumer groups which need the message, before user calls deleteRecords() to delete the message. If user does follow this rule, then user will not delete beyond LSO regardless of the constraint in the broker. If user does not follow this rule, then even with the extra protection in the broker, user can still delete the messages before the LSO that has not been consumed yet. Does this make sense? > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555990#comment-16555990 ] Guozhang Wang commented on KAFKA-7190: -- I've discussed about this with [~hachikuji] and here are our current proposal: 1. Currently the PID will be removed from the broker's cache once its last produced message is truncated (details of the transactional messaging design can be found in KIP-98). We will remove this logic, and only rely on the broker-side config "transactional.id.expiration.ms". 2. Note that the config "transactional.id.expiration.ms" is used for multiple purposes, and its default value is 7 days which would be too long if we are going to do 1) above and hence are going to rely ONLY on it to cleanup PID caches. So we'd probably want to add a separate config for PID expiration only whose default value would be much smaller, say 1 hour. [~lambdaliu] I understand that the above two requires some knowledge on the broker-side transactional messaging feature implementations, and please let me know if you feel comfortable working on it. Another caveat we need to fix which is related to this issue, is that today DeleteRecord request would only try to avoid deleting beyond the high-watermark, but with transactional messaging we should also make sure that deleting would not go beyond LSO (latest stable offset) to make sure that no uncommitted data would be deleted as they are not exposed to consumers yet. [~lindong] since you added the delete record request, would like to know if you are willing to make this modifications as well. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555730#comment-16555730 ] lambdaliu commented on KAFKA-7190: -- Hello [~mjsax] [~guozhang], sorry for the late reply. I also agree to fix this issue on the broker side. As [~guozhang] said, we can delaying the deletion of producer ID to resolve this issue. The problem here is how long to delay the deletion. Is 60 seconds OK? > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553134#comment-16553134 ] Guozhang Wang commented on KAFKA-7190: -- Hello [~lambdaliu], before you start picking up this ticket, I still want to make sure we have agreed on a solution to resolve this. As I mentioned in the email thread: {code} We can, probably, improve this situation either in broker side or streams client side: on broker side, we can consider delaying the deletion of the producer ID for a while; on streams client side, we can consider purging in a bit conservative manner (but it is still a bit tricky, since multiple producers may be sending to the same inner topic, so just leaving the last N messages to not be purged may not be safe still). {code} Personally I'd be agree with [~mjsax] that a fix on the broker side may be cleaner than having streams client side fix. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7190) Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID
[ https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553095#comment-16553095 ] Matthias J. Sax commented on KAFKA-7190: Thanks for picking this up [~lambdaliu]. I would like to get [~hachikuji] input on this, as I am not sure if it might make sense to fix this issue broker side and keep the PID longer, even if all data from topic was deleted. Not sure how hard this would be, but it might be cleaner, instead of "messing around" on the client layer. If we agree to fix it on the client layer, we might want to discuss how to fix it if you have a proposal for a fix, [~lambdaliu]. > Under low traffic conditions purging repartition topics cause WARN statements > about UNKNOWN_PRODUCER_ID > - > > Key: KAFKA-7190 > URL: https://issues.apache.org/jira/browse/KAFKA-7190 > Project: Kafka > Issue Type: Improvement > Components: core, streams >Affects Versions: 1.1.0, 1.1.1 >Reporter: Bill Bejeck >Assignee: lambdaliu >Priority: Major > > When a streams application has little traffic, then it is possible that > consumer purging would delete > even the last message sent by a producer (i.e., all the messages sent by > this producer have been consumed and committed), and as a result, the broker > would delete that producer's ID. The next time when this producer tries to > send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case, > this error is retriable: the producer would just get a new producer id and > retries, and then this time it will succeed. > > Possible fixes could be on the broker side, i.e., delaying the deletion of > the produderIDs for a more extended period or on the streams side developing > a more conservative approach to deleting offsets from repartition topics > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)