Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues
Hi Pieter, FWIW, you may have encountered the following bug: https://issues.apache.org/jira/browse/KAFKA-12671 . Thanks, Alexandre Le ven. 12 juin 2020 à 00:43, D C a écrit : > > Hey peeps, > > Anyone else encountered this and got to the bottom of it? > > I'm facing a similar issue, having LSO stuck for some partitions in a topic > and the consumers can't get data out of it (we're using read_committed = > true). > > When this issue started happening we were on kafka 2.3.1 > i tried: > - restarting the consumers > - deleting the partition from the leader and letting it get in sync with > the new leader > - rolling restart of the brokers > - shutting down the whole cluster and starting it again > - tried deleting the txnindex files (after backing them up) and restarting > the brokers > - tried putting down the follower brokers of a partition and resyncing that > partition on them from scratch > - upgraded both kafka broker and client to 2.5.0 > > Now the following questions arise: > Where is the LSO actually stored (even if you get rid of the txnfiles, the > LSO stays the same). > Is there any way that the LSO can be reset? > Is there any way that you can manually abort and clean the state of a stuck > transaction? (i suspect that this is the reason why the LSO is stuck) > Is there any way to manually trigger a consistency check on the logfiles > that would fix any existing issues with either the logs or the indexes in > the partition? > > Cheers, > Dragos > > On 2019/11/20 13:26:54, Pieter Hameete wrote: > > Hello, > > > > after having some Broker issues (too many open files) we managed to recover > > our Brokers, but read_committed consumers are stuck for a specific topic > > partition. It seems like the LSO is stuck at a specific offset. The > > transactional producer for the topic partition is working without errors so > > the latest offset is incrementing correctly and so is transactional > > producing. > > > > What could be wrong here? And how can we get this specific LSO to be > > increment again? > > > > Thank you in advance for any advice. > > > > Best, > > > > Pieter > >
Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues
Hey peeps, Anyone else encountered this and got to the bottom of it? I'm facing a similar issue, having LSO stuck for some partitions in a topic and the consumers can't get data out of it (we're using read_committed = true). When this issue started happening we were on kafka 2.3.1 i tried: - restarting the consumers - deleting the partition from the leader and letting it get in sync with the new leader - rolling restart of the brokers - shutting down the whole cluster and starting it again - tried deleting the txnindex files (after backing them up) and restarting the brokers - tried putting down the follower brokers of a partition and resyncing that partition on them from scratch - upgraded both kafka broker and client to 2.5.0 Now the following questions arise: Where is the LSO actually stored (even if you get rid of the txnfiles, the LSO stays the same). Is there any way that the LSO can be reset? Is there any way that you can manually abort and clean the state of a stuck transaction? (i suspect that this is the reason why the LSO is stuck) Is there any way to manually trigger a consistency check on the logfiles that would fix any existing issues with either the logs or the indexes in the partition? Cheers, Dragos On 2019/11/20 13:26:54, Pieter Hameete wrote: > Hello, > > after having some Broker issues (too many open files) we managed to recover > our Brokers, but read_committed consumers are stuck for a specific topic > partition. It seems like the LSO is stuck at a specific offset. The > transactional producer for the topic partition is working without errors so > the latest offset is incrementing correctly and so is transactional producing. > > What could be wrong here? And how can we get this specific LSO to be > increment again? > > Thank you in advance for any advice. > > Best, > > Pieter >
Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues
Hey peeps, Anyone else encountered this and got to the bottom of it? I'm facing a similar issue, having LSO stuck for some partitions in a topic and the consumers can't get data out of it (we're using read_committed = true). When this issue started happening we were on kafka 2.3.1 i tried: - restarting the consumers - deleting the partition from the leader and letting it get in sync with the new leader - rolling restart of the brokers - shutting down the whole cluster and starting it again - tried deleting the txnindex files (after backing them up) and restarting the brokers - tried putting down the follower brokers of a partition and resyncing that partition on them from scratch - upgraded both kafka broker and client to 2.5.0 Now the following questions arise: Where is the LSO actually stored (even if you get rid of the txnfiles, the LSO stays the same). Is there any way that the LSO can be reset? Is there any way that you can manually abort and clean the state of a stuck transaction? (i suspect that this is the reason why the LSO is stuck) Is there any way to manually trigger a consistency check on the logfiles that would fix any existing issues with either the logs or the indexes in the partition? Cheers, Dragos
Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues
Hello, I final update on this. I found that there is an open transaction causing the LSO to be stuck at offset 10794778. Similar to this stackoverflow issue: https://stackoverflow.com/questions/56643907/manually-close-old-kafka-transaction Despite using the same pool of transactional IDs this old transaction was not aborted after the brokers and client apps came back online. Is there any way to abort this defective transaction? Or is the only way to migrate all the data from this topic to a new one by using a read_uncommitted reader? Best, Pieter Van: Pieter Hameete Verzonden: woensdag 20 november 2019 16:33 Aan: Ashutosh singh CC: users@kafka.apache.org Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues Hi Ashu, others, I have tested with the latest kafkacat with librdkafka 1.2.2 which can also do transactional reading. Reading the partition with offset reset from beginning will read until offset 10794778 (this is the offset of the LSO that is stuck) Reading the partition from any offset after 10794778 (so any specific offset greater than 10794778, or auto offset reset to latest) will not read anything at all. Reading in uncommitted mode will read properly from any offset. I think my only solution would be to somehow get the LSO on the broker side to increase again. There's nothing I can do on the consumer side to get this working again while keeping read mode read_committed. Best, Pieter Van: Ashutosh singh Verzonden: woensdag 20 november 2019 15:15 Aan: Pieter Hameete CC: users@kafka.apache.org Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues Alright got that. What about resetting or changing the consumer offset ? You can try to change it to some previous offset and restart consumer. Consumer may have to do duplicate processing but should work . On Wed, Nov 20, 2019 at 7:18 PM Pieter Hameete mailto:pieter.hame...@blockbax.com>> wrote: Hi Ashu, thanks for the tip. We have tried restarting the consumer, but that did not help. All read_committed consumers for this partition (we have multiple) have the same issue. The partition already had different leaders, when we performed a rolling-restart of the brokers. All brokers give the same stuck LSO, so I don't think deleting will the partition will help? It will then restore the partition from another in-sync replica but that also has the incorrect LSO? Best, Pieter Van: Ashutosh singh mailto:getas...@gmail.com>> Verzonden: woensdag 20 november 2019 14:43 Aan: users@kafka.apache.org<mailto:users@kafka.apache.org> mailto:users@kafka.apache.org>> Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues Hello Pieter, We had similar issue. Did you try restarting your consumer ? It that doesn't fix then you can try deleting that particular topic partition from the broker and restart the broker so that it will get in sync. Please make sure that you have replica in-sync before deleting the partition. Thanks Ashu On Wed, Nov 20, 2019 at 6:57 PM Pieter Hameete mailto:pieter.hame...@blockbax.com>> wrote: > Hello, > > after having some Broker issues (too many open files) we managed to > recover our Brokers, but read_committed consumers are stuck for a specific > topic partition. It seems like the LSO is stuck at a specific offset. The > transactional producer for the topic partition is working without errors so > the latest offset is incrementing correctly and so is transactional > producing. > > What could be wrong here? And how can we get this specific LSO to be > increment again? > > Thank you in advance for any advice. > > Best, > > Pieter > -- Thanx & Regard Ashutosh Singh 08151945559 -- Thanx & Regard Ashutosh Singh 08151945559
Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues
Hi Ashu, others, I have tested with the latest kafkacat with librdkafka 1.2.2 which can also do transactional reading. Reading the partition with offset reset from beginning will read until offset 10794778 (this is the offset of the LSO that is stuck) Reading the partition from any offset after 10794778 (so any specific offset greater than 10794778, or auto offset reset to latest) will not read anything at all. Reading in uncommitted mode will read properly from any offset. I think my only solution would be to somehow get the LSO on the broker side to increase again. There's nothing I can do on the consumer side to get this working again while keeping read mode read_committed. Best, Pieter Van: Ashutosh singh Verzonden: woensdag 20 november 2019 15:15 Aan: Pieter Hameete CC: users@kafka.apache.org Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues Alright got that. What about resetting or changing the consumer offset ? You can try to change it to some previous offset and restart consumer. Consumer may have to do duplicate processing but should work . On Wed, Nov 20, 2019 at 7:18 PM Pieter Hameete mailto:pieter.hame...@blockbax.com>> wrote: Hi Ashu, thanks for the tip. We have tried restarting the consumer, but that did not help. All read_committed consumers for this partition (we have multiple) have the same issue. The partition already had different leaders, when we performed a rolling-restart of the brokers. All brokers give the same stuck LSO, so I don't think deleting will the partition will help? It will then restore the partition from another in-sync replica but that also has the incorrect LSO? Best, Pieter Van: Ashutosh singh mailto:getas...@gmail.com>> Verzonden: woensdag 20 november 2019 14:43 Aan: users@kafka.apache.org<mailto:users@kafka.apache.org> mailto:users@kafka.apache.org>> Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues Hello Pieter, We had similar issue. Did you try restarting your consumer ? It that doesn't fix then you can try deleting that particular topic partition from the broker and restart the broker so that it will get in sync. Please make sure that you have replica in-sync before deleting the partition. Thanks Ashu On Wed, Nov 20, 2019 at 6:57 PM Pieter Hameete mailto:pieter.hame...@blockbax.com>> wrote: > Hello, > > after having some Broker issues (too many open files) we managed to > recover our Brokers, but read_committed consumers are stuck for a specific > topic partition. It seems like the LSO is stuck at a specific offset. The > transactional producer for the topic partition is working without errors so > the latest offset is incrementing correctly and so is transactional > producing. > > What could be wrong here? And how can we get this specific LSO to be > increment again? > > Thank you in advance for any advice. > > Best, > > Pieter > -- Thanx & Regard Ashutosh Singh 08151945559 -- Thanx & Regard Ashutosh Singh 08151945559
Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues
Alright got that. What about resetting or changing the consumer offset ? You can try to change it to some previous offset and restart consumer. Consumer may have to do duplicate processing but should work . On Wed, Nov 20, 2019 at 7:18 PM Pieter Hameete wrote: > Hi Ashu, > > thanks for the tip. We have tried restarting the consumer, but that did > not help. All read_committed consumers for this partition (we have > multiple) have the same issue. > > The partition already had different leaders, when we performed a > rolling-restart of the brokers. All brokers give the same stuck LSO, so I > don't think deleting will the partition will help? It will then restore the > partition from another in-sync replica but that also has the incorrect LSO? > > Best, > > Pieter > -- > *Van:* Ashutosh singh > *Verzonden:* woensdag 20 november 2019 14:43 > *Aan:* users@kafka.apache.org > *Onderwerp:* Re: Last Stable Offset (LSO) stuck for specific topic > partition after Broker issues > > Hello Pieter, > > We had similar issue. > > Did you try restarting your consumer ? It that doesn't fix then you can > try deleting that particular topic partition from the broker and restart > the broker so that it will get in sync. Please make sure that you have > replica in-sync before deleting the partition. > > Thanks > Ashu > > > On Wed, Nov 20, 2019 at 6:57 PM Pieter Hameete < > pieter.hame...@blockbax.com> > wrote: > > > Hello, > > > > after having some Broker issues (too many open files) we managed to > > recover our Brokers, but read_committed consumers are stuck for a > specific > > topic partition. It seems like the LSO is stuck at a specific offset. The > > transactional producer for the topic partition is working without errors > so > > the latest offset is incrementing correctly and so is transactional > > producing. > > > > What could be wrong here? And how can we get this specific LSO to be > > increment again? > > > > Thank you in advance for any advice. > > > > Best, > > > > Pieter > > > > > -- > Thanx & Regard > Ashutosh Singh > 08151945559 > -- Thanx & Regard Ashutosh Singh 08151945559
Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues
Hi Ashu, thanks for the tip. We have tried restarting the consumer, but that did not help. All read_committed consumers for this partition (we have multiple) have the same issue. The partition already had different leaders, when we performed a rolling-restart of the brokers. All brokers give the same stuck LSO, so I don't think deleting will the partition will help? It will then restore the partition from another in-sync replica but that also has the incorrect LSO? Best, Pieter Van: Ashutosh singh Verzonden: woensdag 20 november 2019 14:43 Aan: users@kafka.apache.org Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues Hello Pieter, We had similar issue. Did you try restarting your consumer ? It that doesn't fix then you can try deleting that particular topic partition from the broker and restart the broker so that it will get in sync. Please make sure that you have replica in-sync before deleting the partition. Thanks Ashu On Wed, Nov 20, 2019 at 6:57 PM Pieter Hameete wrote: > Hello, > > after having some Broker issues (too many open files) we managed to > recover our Brokers, but read_committed consumers are stuck for a specific > topic partition. It seems like the LSO is stuck at a specific offset. The > transactional producer for the topic partition is working without errors so > the latest offset is incrementing correctly and so is transactional > producing. > > What could be wrong here? And how can we get this specific LSO to be > increment again? > > Thank you in advance for any advice. > > Best, > > Pieter > -- Thanx & Regard Ashutosh Singh 08151945559
Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues
Hello Pieter, We had similar issue. Did you try restarting your consumer ? It that doesn't fix then you can try deleting that particular topic partition from the broker and restart the broker so that it will get in sync. Please make sure that you have replica in-sync before deleting the partition. Thanks Ashu On Wed, Nov 20, 2019 at 6:57 PM Pieter Hameete wrote: > Hello, > > after having some Broker issues (too many open files) we managed to > recover our Brokers, but read_committed consumers are stuck for a specific > topic partition. It seems like the LSO is stuck at a specific offset. The > transactional producer for the topic partition is working without errors so > the latest offset is incrementing correctly and so is transactional > producing. > > What could be wrong here? And how can we get this specific LSO to be > increment again? > > Thank you in advance for any advice. > > Best, > > Pieter > -- Thanx & Regard Ashutosh Singh 08151945559
Last Stable Offset (LSO) stuck for specific topic partition after Broker issues
Hello, after having some Broker issues (too many open files) we managed to recover our Brokers, but read_committed consumers are stuck for a specific topic partition. It seems like the LSO is stuck at a specific offset. The transactional producer for the topic partition is working without errors so the latest offset is incrementing correctly and so is transactional producing. What could be wrong here? And how can we get this specific LSO to be increment again? Thank you in advance for any advice. Best, Pieter