Re: Highwater mark interpretation

2020-06-20 Thread D C
Hey Nag Y,

I’m not exactly sure if reducing the replication factor while a broker is
down would release the messages to be consumed (or at least not on all
partitions) for the simple fact that it might just remove the last replica
in the list which might not mach your unreachable broker.
Personally i would go and do a manual reassignment of partitions (kafka
manager allows you to do that in an easy visual environment) and move the
replicas out of the broken broker to a working one and once that’s done and
the data copied to the new broker the high watermark should go up as all
the replicas will be in sync.

Cheers,
D

On Sunday, June 21, 2020, Nag Y  wrote:

> Thanks D C. Thanks a lot . That is quite a detailed explanation.
> If I understand correctly, ( ignoring the case where producers
> create transactions) - since the replica is down and never comes , the high
> watermark CANNOT advance and the consumer CAN NOT read the messages which
> were sent after the replica is down as the message is NOT committed - Hope
> this is correct ?

——
Indeed, this is correct.
——

>
> To address this situation, either we should make sure the replica is up or
> reduce the replication factor so that the message will be committed and
> consumer can start reading the messages ...
>
> Regards,
>  Nag
>
>
> On Sun, Jun 21, 2020 at 3:25 AM D C  wrote:
>
> > The short answer is : yes, a consumer can only consume messages up to the
> > High Watermark.
> >
> > The long answer is not exactly, for the following reasons:
> >
> > At the partition level you have 3 major offsets that are important to the
> > health of the partition and accessibility from the consumer pov:
> > LeO (log end offset) - which represents the highest offset in the highest
> > segment
> > High Watermark - which represents the latest offset that has been
> > replicated to all the followers
> > LSO (Last stable offset) - which is important when you use producers that
> > create transactions - which represents the the highest offset that has
> been
> > committed by a transaction and that is allowed to be read with isolation
> > level = read_commited.
> >
> > The LeO can only be higher or equal to the High Watermark (for obvious
> > reasons)
> > The High Watermark can only be higher or equal to the LSO (the messages
> up
> > to this point may have been committed to all the followers but the
> > transaction isn't yet finished)
> > And coming to your question, in case the transaction hasn't finished, the
> > LSO may be lower than the High Watermark so if your consumer is accessing
> > the data in Read_Committed, it won't be able to surpass the LSO.
> >
> > Cheers,
> > D
> >
> > On Sat, Jun 20, 2020 at 9:05 PM Nag Y 
> wrote:
> >
> > > As I understand it, the consumer can only read "committed" messages -
> > which
> > > I believe, if we look at internals of it, committed messages are
> nothing
> > > but messages which are upto the high watermark.
> > > *The high watermark is the offset of the last message that was
> > successfully
> > > copied to all of the log’s replicas. *
> > >
> > > *Having said that, if one of the replica is down, will high water mark
> > be*
> > > *advanced?*
> > >
> > > *If replica can't come forever, can we consider this message cant be
> > > consumed by the consumer since it is never committed *
> > >
> >
>


--


Re: Highwater mark interpretation

2020-06-20 Thread D C
The short answer is : yes, a consumer can only consume messages up to the
High Watermark.

The long answer is not exactly, for the following reasons:

At the partition level you have 3 major offsets that are important to the
health of the partition and accessibility from the consumer pov:
LeO (log end offset) - which represents the highest offset in the highest
segment
High Watermark - which represents the latest offset that has been
replicated to all the followers
LSO (Last stable offset) - which is important when you use producers that
create transactions - which represents the the highest offset that has been
committed by a transaction and that is allowed to be read with isolation
level = read_commited.

The LeO can only be higher or equal to the High Watermark (for obvious
reasons)
The High Watermark can only be higher or equal to the LSO (the messages up
to this point may have been committed to all the followers but the
transaction isn't yet finished)
And coming to your question, in case the transaction hasn't finished, the
LSO may be lower than the High Watermark so if your consumer is accessing
the data in Read_Committed, it won't be able to surpass the LSO.

Cheers,
D

On Sat, Jun 20, 2020 at 9:05 PM Nag Y  wrote:

> As I understand it, the consumer can only read "committed" messages - which
> I believe, if we look at internals of it, committed messages are nothing
> but messages which are upto the high watermark.
> *The high watermark is the offset of the last message that was successfully
> copied to all of the log’s replicas. *
>
> *Having said that, if one of the replica is down, will high water mark be*
> *advanced?*
>
> *If replica can't come forever, can we consider this message cant be
> consumed by the consumer since it is never committed *
>


Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2020-06-11 Thread D C
Hey peeps,

Anyone else encountered this and got to the bottom of it?

I'm facing a similar issue, having LSO stuck for some partitions in a topic
and the consumers can't get data out of it (we're using read_committed =
true).

When this issue started happening we were on kafka 2.3.1
i tried:
- restarting the consumers
- deleting the partition from the leader and letting it get in sync with
the new leader
- rolling restart of the brokers
- shutting down the whole cluster and starting it again
- tried deleting the txnindex files (after backing them up) and restarting
the brokers
- tried putting down the follower brokers of a partition and resyncing that
partition on them from scratch
- upgraded both kafka broker and client to 2.5.0

Now the following questions arise:
Where is the LSO actually stored (even if you get rid of the txnfiles, the
LSO stays the same).
Is there any way that the LSO can be reset?
Is there any way that you can manually abort and clean the state of a stuck
transaction? (i suspect that this is the reason why the LSO is stuck)
Is there any way to manually trigger a consistency check on the logfiles
that would fix any existing issues with either the logs or the indexes in
the partition?

Cheers,
Dragos

On 2019/11/20 13:26:54, Pieter Hameete  wrote: 
> Hello,
> 
> after having some Broker issues (too many open files) we managed to recover 
> our Brokers, but read_committed consumers are stuck for a specific topic 
> partition. It seems like the LSO is stuck at a specific offset. The 
> transactional producer for the topic partition is working without errors so 
> the latest offset is incrementing correctly and so is transactional producing.
> 
> What could be wrong here? And how can we get this specific LSO to be 
> increment again?
> 
> Thank you in advance for any advice.
> 
> Best,
> 
> Pieter
> 


Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2020-06-10 Thread D C
Hey peeps,

Anyone else encountered this and got to the bottom of it?

I'm facing a similar issue, having LSO stuck for some partitions in a topic
and the consumers can't get data out of it (we're using read_committed =
true).

When this issue started happening we were on kafka 2.3.1
i tried:
- restarting the consumers
- deleting the partition from the leader and letting it get in sync with
the new leader
- rolling restart of the brokers
- shutting down the whole cluster and starting it again
- tried deleting the txnindex files (after backing them up) and restarting
the brokers
- tried putting down the follower brokers of a partition and resyncing that
partition on them from scratch
- upgraded both kafka broker and client to 2.5.0

Now the following questions arise:
Where is the LSO actually stored (even if you get rid of the txnfiles, the
LSO stays the same).
Is there any way that the LSO can be reset?
Is there any way that you can manually abort and clean the state of a stuck
transaction? (i suspect that this is the reason why the LSO is stuck)
Is there any way to manually trigger a consistency check on the logfiles
that would fix any existing issues with either the logs or the indexes in
the partition?

Cheers,
Dragos


Separating internal and external traffic

2016-05-25 Thread D C
I'm sure i can do this but I'm just not stumbling on the right
documentation anywhere.  I have a handful of kafka servers that I am trying
to get ready for production. I'm trying separate the internal and external
network traffic, but I don't see how to do it.

Each host has two addresses.
10.x.y.z = default interface
192.168.x.y = private network seen only by the kafka nodes.

How can I tell kafka to make use of this?