Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2021-05-16 Thread Alexandre Dupriez
Hi Pieter,

FWIW, you may have encountered the following bug:
https://issues.apache.org/jira/browse/KAFKA-12671 .

Thanks,
Alexandre

Le ven. 12 juin 2020 à 00:43, D C  a écrit :
>
> Hey peeps,
>
> Anyone else encountered this and got to the bottom of it?
>
> I'm facing a similar issue, having LSO stuck for some partitions in a topic
> and the consumers can't get data out of it (we're using read_committed =
> true).
>
> When this issue started happening we were on kafka 2.3.1
> i tried:
> - restarting the consumers
> - deleting the partition from the leader and letting it get in sync with
> the new leader
> - rolling restart of the brokers
> - shutting down the whole cluster and starting it again
> - tried deleting the txnindex files (after backing them up) and restarting
> the brokers
> - tried putting down the follower brokers of a partition and resyncing that
> partition on them from scratch
> - upgraded both kafka broker and client to 2.5.0
>
> Now the following questions arise:
> Where is the LSO actually stored (even if you get rid of the txnfiles, the
> LSO stays the same).
> Is there any way that the LSO can be reset?
> Is there any way that you can manually abort and clean the state of a stuck
> transaction? (i suspect that this is the reason why the LSO is stuck)
> Is there any way to manually trigger a consistency check on the logfiles
> that would fix any existing issues with either the logs or the indexes in
> the partition?
>
> Cheers,
> Dragos
>
> On 2019/11/20 13:26:54, Pieter Hameete  wrote:
> > Hello,
> >
> > after having some Broker issues (too many open files) we managed to recover 
> > our Brokers, but read_committed consumers are stuck for a specific topic 
> > partition. It seems like the LSO is stuck at a specific offset. The 
> > transactional producer for the topic partition is working without errors so 
> > the latest offset is incrementing correctly and so is transactional 
> > producing.
> >
> > What could be wrong here? And how can we get this specific LSO to be 
> > increment again?
> >
> > Thank you in advance for any advice.
> >
> > Best,
> >
> > Pieter
> >


Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2020-06-11 Thread D C
Hey peeps,

Anyone else encountered this and got to the bottom of it?

I'm facing a similar issue, having LSO stuck for some partitions in a topic
and the consumers can't get data out of it (we're using read_committed =
true).

When this issue started happening we were on kafka 2.3.1
i tried:
- restarting the consumers
- deleting the partition from the leader and letting it get in sync with
the new leader
- rolling restart of the brokers
- shutting down the whole cluster and starting it again
- tried deleting the txnindex files (after backing them up) and restarting
the brokers
- tried putting down the follower brokers of a partition and resyncing that
partition on them from scratch
- upgraded both kafka broker and client to 2.5.0

Now the following questions arise:
Where is the LSO actually stored (even if you get rid of the txnfiles, the
LSO stays the same).
Is there any way that the LSO can be reset?
Is there any way that you can manually abort and clean the state of a stuck
transaction? (i suspect that this is the reason why the LSO is stuck)
Is there any way to manually trigger a consistency check on the logfiles
that would fix any existing issues with either the logs or the indexes in
the partition?

Cheers,
Dragos

On 2019/11/20 13:26:54, Pieter Hameete  wrote: 
> Hello,
> 
> after having some Broker issues (too many open files) we managed to recover 
> our Brokers, but read_committed consumers are stuck for a specific topic 
> partition. It seems like the LSO is stuck at a specific offset. The 
> transactional producer for the topic partition is working without errors so 
> the latest offset is incrementing correctly and so is transactional producing.
> 
> What could be wrong here? And how can we get this specific LSO to be 
> increment again?
> 
> Thank you in advance for any advice.
> 
> Best,
> 
> Pieter
> 


Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2020-06-10 Thread D C
Hey peeps,

Anyone else encountered this and got to the bottom of it?

I'm facing a similar issue, having LSO stuck for some partitions in a topic
and the consumers can't get data out of it (we're using read_committed =
true).

When this issue started happening we were on kafka 2.3.1
i tried:
- restarting the consumers
- deleting the partition from the leader and letting it get in sync with
the new leader
- rolling restart of the brokers
- shutting down the whole cluster and starting it again
- tried deleting the txnindex files (after backing them up) and restarting
the brokers
- tried putting down the follower brokers of a partition and resyncing that
partition on them from scratch
- upgraded both kafka broker and client to 2.5.0

Now the following questions arise:
Where is the LSO actually stored (even if you get rid of the txnfiles, the
LSO stays the same).
Is there any way that the LSO can be reset?
Is there any way that you can manually abort and clean the state of a stuck
transaction? (i suspect that this is the reason why the LSO is stuck)
Is there any way to manually trigger a consistency check on the logfiles
that would fix any existing issues with either the logs or the indexes in
the partition?

Cheers,
Dragos


Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2019-11-21 Thread Pieter Hameete
Hello,

I final update on this. I found that there is an open transaction causing the 
LSO to be stuck at offset 10794778. Similar to this stackoverflow issue:

https://stackoverflow.com/questions/56643907/manually-close-old-kafka-transaction

Despite using the same pool of transactional IDs this old transaction was not 
aborted after the brokers and client apps came back online.

Is there any way to abort this defective transaction? Or is the only way to 
migrate all the data from this topic to a new one by using a read_uncommitted 
reader?

Best,

Pieter

Van: Pieter Hameete 
Verzonden: woensdag 20 november 2019 16:33
Aan: Ashutosh singh 
CC: users@kafka.apache.org 
Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition 
after Broker issues

Hi Ashu, others,

I have tested with the latest kafkacat with librdkafka 1.2.2 which can also do 
transactional reading.

Reading the partition with offset reset from beginning will read until offset 
10794778 (this is the offset of the LSO that is stuck)

Reading the partition from any offset after 10794778 (so any specific offset 
greater than 10794778, or auto offset reset to latest) will not read anything 
at all.

Reading in uncommitted mode will read properly from any offset.

I think my only solution would be to somehow get the LSO on the broker side to 
increase again. There's nothing I can do on the consumer side to get this 
working again while keeping read mode read_committed.

Best,

Pieter

Van: Ashutosh singh 
Verzonden: woensdag 20 november 2019 15:15
Aan: Pieter Hameete 
CC: users@kafka.apache.org 
Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition 
after Broker issues

Alright got that.
What about resetting or changing the consumer offset ?  You can try to change 
it to some previous offset and restart consumer.  Consumer may have to do 
duplicate processing but should work .

On Wed, Nov 20, 2019 at 7:18 PM Pieter Hameete 
mailto:pieter.hame...@blockbax.com>> wrote:
Hi Ashu,

thanks for the tip. We have tried restarting the consumer, but that did not 
help. All read_committed consumers for this partition (we have multiple) have 
the same issue.

The partition already had different leaders, when we performed a 
rolling-restart of the brokers. All brokers give the same stuck LSO, so I don't 
think deleting will the partition will help? It will then restore the partition 
from another in-sync replica but that also has the incorrect LSO?

Best,

Pieter

Van: Ashutosh singh mailto:getas...@gmail.com>>
Verzonden: woensdag 20 november 2019 14:43
Aan: users@kafka.apache.org<mailto:users@kafka.apache.org> 
mailto:users@kafka.apache.org>>
Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition 
after Broker issues

Hello Pieter,

We had similar issue.

Did you try restarting your consumer ?  It that doesn't fix then you can
try deleting that particular topic partition from the broker and restart
the broker so that it will get in sync.  Please make sure that you have
replica in-sync before deleting the partition.

Thanks
Ashu


On Wed, Nov 20, 2019 at 6:57 PM Pieter Hameete 
mailto:pieter.hame...@blockbax.com>>
wrote:

> Hello,
>
> after having some Broker issues (too many open files) we managed to
> recover our Brokers, but read_committed consumers are stuck for a specific
> topic partition. It seems like the LSO is stuck at a specific offset. The
> transactional producer for the topic partition is working without errors so
> the latest offset is incrementing correctly and so is transactional
> producing.
>
> What could be wrong here? And how can we get this specific LSO to be
> increment again?
>
> Thank you in advance for any advice.
>
> Best,
>
> Pieter
>


--
Thanx & Regard
Ashutosh Singh
08151945559


--
Thanx & Regard
Ashutosh Singh
08151945559



Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2019-11-20 Thread Pieter Hameete
Hi Ashu, others,

I have tested with the latest kafkacat with librdkafka 1.2.2 which can also do 
transactional reading.

Reading the partition with offset reset from beginning will read until offset 
10794778 (this is the offset of the LSO that is stuck)

Reading the partition from any offset after 10794778 (so any specific offset 
greater than 10794778, or auto offset reset to latest) will not read anything 
at all.

Reading in uncommitted mode will read properly from any offset.

I think my only solution would be to somehow get the LSO on the broker side to 
increase again. There's nothing I can do on the consumer side to get this 
working again while keeping read mode read_committed.

Best,

Pieter

Van: Ashutosh singh 
Verzonden: woensdag 20 november 2019 15:15
Aan: Pieter Hameete 
CC: users@kafka.apache.org 
Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition 
after Broker issues

Alright got that.
What about resetting or changing the consumer offset ?  You can try to change 
it to some previous offset and restart consumer.  Consumer may have to do 
duplicate processing but should work .

On Wed, Nov 20, 2019 at 7:18 PM Pieter Hameete 
mailto:pieter.hame...@blockbax.com>> wrote:
Hi Ashu,

thanks for the tip. We have tried restarting the consumer, but that did not 
help. All read_committed consumers for this partition (we have multiple) have 
the same issue.

The partition already had different leaders, when we performed a 
rolling-restart of the brokers. All brokers give the same stuck LSO, so I don't 
think deleting will the partition will help? It will then restore the partition 
from another in-sync replica but that also has the incorrect LSO?

Best,

Pieter

Van: Ashutosh singh mailto:getas...@gmail.com>>
Verzonden: woensdag 20 november 2019 14:43
Aan: users@kafka.apache.org<mailto:users@kafka.apache.org> 
mailto:users@kafka.apache.org>>
Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition 
after Broker issues

Hello Pieter,

We had similar issue.

Did you try restarting your consumer ?  It that doesn't fix then you can
try deleting that particular topic partition from the broker and restart
the broker so that it will get in sync.  Please make sure that you have
replica in-sync before deleting the partition.

Thanks
Ashu


On Wed, Nov 20, 2019 at 6:57 PM Pieter Hameete 
mailto:pieter.hame...@blockbax.com>>
wrote:

> Hello,
>
> after having some Broker issues (too many open files) we managed to
> recover our Brokers, but read_committed consumers are stuck for a specific
> topic partition. It seems like the LSO is stuck at a specific offset. The
> transactional producer for the topic partition is working without errors so
> the latest offset is incrementing correctly and so is transactional
> producing.
>
> What could be wrong here? And how can we get this specific LSO to be
> increment again?
>
> Thank you in advance for any advice.
>
> Best,
>
> Pieter
>


--
Thanx & Regard
Ashutosh Singh
08151945559


--
Thanx & Regard
Ashutosh Singh
08151945559



Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2019-11-20 Thread Ashutosh singh
Alright got that.
What about resetting or changing the consumer offset ?  You can try to
change it to some previous offset and restart consumer.  Consumer may have
to do duplicate processing but should work .

On Wed, Nov 20, 2019 at 7:18 PM Pieter Hameete 
wrote:

> Hi Ashu,
>
> thanks for the tip. We have tried restarting the consumer, but that did
> not help. All read_committed consumers for this partition (we have
> multiple) have the same issue.
>
> The partition already had different leaders, when we performed a
> rolling-restart of the brokers. All brokers give the same stuck LSO, so I
> don't think deleting will the partition will help? It will then restore the
> partition from another in-sync replica but that also has the incorrect LSO?
>
> Best,
>
> Pieter
> --
> *Van:* Ashutosh singh 
> *Verzonden:* woensdag 20 november 2019 14:43
> *Aan:* users@kafka.apache.org 
> *Onderwerp:* Re: Last Stable Offset (LSO) stuck for specific topic
> partition after Broker issues
>
> Hello Pieter,
>
> We had similar issue.
>
> Did you try restarting your consumer ?  It that doesn't fix then you can
> try deleting that particular topic partition from the broker and restart
> the broker so that it will get in sync.  Please make sure that you have
> replica in-sync before deleting the partition.
>
> Thanks
> Ashu
>
>
> On Wed, Nov 20, 2019 at 6:57 PM Pieter Hameete <
> pieter.hame...@blockbax.com>
> wrote:
>
> > Hello,
> >
> > after having some Broker issues (too many open files) we managed to
> > recover our Brokers, but read_committed consumers are stuck for a
> specific
> > topic partition. It seems like the LSO is stuck at a specific offset. The
> > transactional producer for the topic partition is working without errors
> so
> > the latest offset is incrementing correctly and so is transactional
> > producing.
> >
> > What could be wrong here? And how can we get this specific LSO to be
> > increment again?
> >
> > Thank you in advance for any advice.
> >
> > Best,
> >
> > Pieter
> >
>
>
> --
> Thanx & Regard
> Ashutosh Singh
> 08151945559
>


-- 
Thanx & Regard
Ashutosh Singh
08151945559


Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2019-11-20 Thread Pieter Hameete
Hi Ashu,

thanks for the tip. We have tried restarting the consumer, but that did not 
help. All read_committed consumers for this partition (we have multiple) have 
the same issue.

The partition already had different leaders, when we performed a 
rolling-restart of the brokers. All brokers give the same stuck LSO, so I don't 
think deleting will the partition will help? It will then restore the partition 
from another in-sync replica but that also has the incorrect LSO?

Best,

Pieter

Van: Ashutosh singh 
Verzonden: woensdag 20 november 2019 14:43
Aan: users@kafka.apache.org 
Onderwerp: Re: Last Stable Offset (LSO) stuck for specific topic partition 
after Broker issues

Hello Pieter,

We had similar issue.

Did you try restarting your consumer ?  It that doesn't fix then you can
try deleting that particular topic partition from the broker and restart
the broker so that it will get in sync.  Please make sure that you have
replica in-sync before deleting the partition.

Thanks
Ashu


On Wed, Nov 20, 2019 at 6:57 PM Pieter Hameete 
wrote:

> Hello,
>
> after having some Broker issues (too many open files) we managed to
> recover our Brokers, but read_committed consumers are stuck for a specific
> topic partition. It seems like the LSO is stuck at a specific offset. The
> transactional producer for the topic partition is working without errors so
> the latest offset is incrementing correctly and so is transactional
> producing.
>
> What could be wrong here? And how can we get this specific LSO to be
> increment again?
>
> Thank you in advance for any advice.
>
> Best,
>
> Pieter
>


--
Thanx & Regard
Ashutosh Singh
08151945559


Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2019-11-20 Thread Ashutosh singh
Hello Pieter,

We had similar issue.

Did you try restarting your consumer ?  It that doesn't fix then you can
try deleting that particular topic partition from the broker and restart
the broker so that it will get in sync.  Please make sure that you have
replica in-sync before deleting the partition.

Thanks
Ashu


On Wed, Nov 20, 2019 at 6:57 PM Pieter Hameete 
wrote:

> Hello,
>
> after having some Broker issues (too many open files) we managed to
> recover our Brokers, but read_committed consumers are stuck for a specific
> topic partition. It seems like the LSO is stuck at a specific offset. The
> transactional producer for the topic partition is working without errors so
> the latest offset is incrementing correctly and so is transactional
> producing.
>
> What could be wrong here? And how can we get this specific LSO to be
> increment again?
>
> Thank you in advance for any advice.
>
> Best,
>
> Pieter
>


-- 
Thanx & Regard
Ashutosh Singh
08151945559


Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2019-11-20 Thread Pieter Hameete
Hello,

after having some Broker issues (too many open files) we managed to recover our 
Brokers, but read_committed consumers are stuck for a specific topic partition. 
It seems like the LSO is stuck at a specific offset. The transactional producer 
for the topic partition is working without errors so the latest offset is 
incrementing correctly and so is transactional producing.

What could be wrong here? And how can we get this specific LSO to be increment 
again?

Thank you in advance for any advice.

Best,

Pieter