Re: Maximum Topic Length in Kafka

2015-11-28 Thread Debraj Manna
Let me explain my use case:-

We have a ELK setup in which logstash-forwarders pushes logs from different
services to a logstash. The logstash then pushes them to kafka. The
logstash consumer then pulls them out of Kafka and indexes them to
Elasticsearch cluster.

We are trying to ensure that no single service logs doesn't overwhelm the
system. So I was thinking if each service logs go in their own topics in
kafka and if we can specify a maximum length in the topic then the producer
of that topic can block when a kafka topic is full.
AFAIK there is no such notion as maximum length of a topic, i.e. offset has
no limit, except Long.MAX_VALUE I think, which should be enough for a
couple of lifetimes (9 * 10E18, or quintillion or million trillions).

What would be the purpose of that, besides being a nice foot-gun :)

Marko Bonaći
Monitoring | Alerting | Anomaly Detection | Centralized Log Management
Solr & Elasticsearch Support
Sematext  | Contact


On Sat, Nov 28, 2015 at 2:13 PM, Debraj Manna 
wrote:

> Hi,
>
> Can some one please let me know the following:-
>
>
>1. Is it possible to specify maximum length of a particular topic ( in
>terms of number of messages ) in kafka ?
>2. Also how does Kafka behave when a particular topic gets full?
>3. Can the producer be blocked if a topic get full rather than deleting
>old messages?
>
> I have gone through the documentation
>  but
> could not find anything of what I am looking for.
>


Re: Maximum Topic Length in Kafka

2015-11-28 Thread Guozhang Wang
Kafka server has a data retention policy based on either time or #.message
(e.g. Kafka brokers will automatically delete the oldest data segment if
its oldest data has been xx milliseconds ago, of if its total log size has
exceed yy MBs, with threshold values configurable).

The producer clients will not get affected by any of the log retention
policies on brokers, it will always be able to produce as long as the
server is alive. It is the server's responsibility to truncate data
according to the retention policy.

Guozhang

On Sat, Nov 28, 2015 at 10:34 AM, Marko Bonaći 
wrote:

> AFAIK there is no such notion as maximum length of a topic, i.e. offset has
> no limit, except Long.MAX_VALUE I think, which should be enough for a
> couple of lifetimes (9 * 10E18, or quintillion or million trillions).
>
> What would be the purpose of that, besides being a nice foot-gun :)
>
> Marko Bonaći
> Monitoring | Alerting | Anomaly Detection | Centralized Log Management
> Solr & Elasticsearch Support
> Sematext  | Contact
> 
>
> On Sat, Nov 28, 2015 at 2:13 PM, Debraj Manna 
> wrote:
>
> > Hi,
> >
> > Can some one please let me know the following:-
> >
> >
> >1. Is it possible to specify maximum length of a particular topic ( in
> >terms of number of messages ) in kafka ?
> >2. Also how does Kafka behave when a particular topic gets full?
> >3. Can the producer be blocked if a topic get full rather than
> deleting
> >old messages?
> >
> > I have gone through the documentation
> >  but
> > could not find anything of what I am looking for.
> >
>



-- 
-- Guozhang


Re: Maximum Topic Length in Kafka

2015-11-28 Thread Marko Bonaći
AFAIK there is no such notion as maximum length of a topic, i.e. offset has
no limit, except Long.MAX_VALUE I think, which should be enough for a
couple of lifetimes (9 * 10E18, or quintillion or million trillions).

What would be the purpose of that, besides being a nice foot-gun :)

Marko Bonaći
Monitoring | Alerting | Anomaly Detection | Centralized Log Management
Solr & Elasticsearch Support
Sematext  | Contact


On Sat, Nov 28, 2015 at 2:13 PM, Debraj Manna 
wrote:

> Hi,
>
> Can some one please let me know the following:-
>
>
>1. Is it possible to specify maximum length of a particular topic ( in
>terms of number of messages ) in kafka ?
>2. Also how does Kafka behave when a particular topic gets full?
>3. Can the producer be blocked if a topic get full rather than deleting
>old messages?
>
> I have gone through the documentation
>  but
> could not find anything of what I am looking for.
>


Re: SV: What is the benefit of using acks=all and minover e.g. acks=3

2015-11-28 Thread Prabhjot Bharaj
Hi,

Of all the parameters, num.replica.fetchers should be kept higher to 4 can
be of help.
Please try it out and let us know if it worked

Thanks,
Prabhjot
On Nov 28, 2015 4:59 PM, "Andreas Flinck" 
wrote:

> Hi!
>
> Here are our settings for the properties requested:
>
> num.network.threads=3
> socket.request.max.bytes=104857600
> socket.receive.buffer.bytes=1048576
> socket.send.buffer.bytes=1048576
>
> The following properties we don't set at all, so I guess they will default
> according to the documentation (within parenthesis):
>
> "num.replica.fetchers": (1)
> "replica.fetch.wait.max.ms": (500),
> "num.recovery.threads.per.data.dir": (1)
>
> The producer properties we explicitly set are the following;
>
> block.on.buffer.full=false
> client.id=MZ
> max.request.size=1048576
> acks=all
> retries=0
> timeout.ms=3
> buffer.memory=67108864
> metadata.fetch.timeout.ms=3000
>
> Do let me know what you think about it! We are currently setting up some
> tests using the broker properties that you suggested.
>
> Regards
> Andreas
>
>
>
>
>
>
> 
> Från: Prabhjot Bharaj 
> Skickat: den 28 november 2015 11:37
> Till: users@kafka.apache.org
> Ämne: Re: What is the benefit of using acks=all and minover e.g. acks=3
>
> Hi,
>
> Clogging can happen if, as seems in your case, the requests are bounded by
> network.
> Just to confirm your configurations, does your broker configuration look
> like this?? :-
>
> "num.replica.fetchers": 4,
> "replica.fetch.wait.max.ms": 500,
> "num.recovery.threads.per.data.dir": 4,
>
>
> "num.network.threads": 8,
> "socket.request.max.bytes": 104857600,
> "socket.receive.buffer.bytes": 10485760,
> "socket.send.buffer.bytes": 10485760,
>
> Similarly, please share your producer config as well. I'm thinking may be
> it is related to tuning your cluster.
>
> Thanks,
> Prabhjot
>
>
> On Sat, Nov 28, 2015 at 3:54 PM, Andreas Flinck <
> andreas.fli...@digitalroute.com> wrote:
>
> > Great, thanks for the information! So it is definitely acks=all we want
> to
> > go for. Unfortunately we run into an blocking issue in our production
> like
> > test environment which we have not been able to find a solution for. So
> > here it is, ANY idea on how we could possibly find a solution is very
> much
> > appreciated!
> >
> > Environment:
> > Kafka version: kafka_2.11-0.8.2.1
> > 5 kafka brokers and 5 ZK on spread out on 5 hosts
> > Using new producer (async)
> >
> > Topic:
> > partitions=10
> > replication-factor=4
> > min.insync.replicas=2
> >
> > Default property values used for broker configs and producer.
> >
> > Scenario and problem:
> > Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers
> which
> > is working great until we start another 5 producers sending to another 5
> > topics with the same rate (10k). What happens then is that the producers
> > sending to 2 of the topics fills up the buffer and the throughput becomes
> > very low, with BufferExhaustedExceptions for most of the messages. When
> > checking the latency for the problematic topics it becomes really high
> > (around 150ms). Stopping the 5 producers that were started in the second
> > round, the latency goes down to about 1 ms again and the buffer will go
> > back to normal. The load is not that high, about 10MB/s, it is not even
> > near disk bound.
> > So the questions right now are, why do we get such high latency to
> > specifically two topics when starting more producers, even though cpu and
> > disk load looks unproblematic? And why two topics specifically, is there
> an
> > order of what topics to prfioritize when things get clogged for some
> reason?
> >
> > Sorry for the quite messy description, we are all kind of new at kafka
> > here!
> >
> > BR
> > Andreas
> >
> > > On 28 Nov 2015, at 09:26, Prabhjot Bharaj 
> wrote:
> > >
> > > Hi,
> > >
> > > This should help :)
> > >
> > > During my benchmarks, I noticed that if 5 node kafka cluster running 1
> > > topic is given a continuous injection of 50GB in one shot (using a
> > modified
> > > producer performance script, which writes my custom data to kafka), the
> > > last replica can sometimes lag and it used to catch up at a speed of
> 1GB
> > in
> > > 20-25 seconds. This lag increases if producer performance injects 200GB
> > in
> > > one shot.
> > >
> > > I'm not sure how it will behave with multiple topics.  it could have an
> > > impact on the overall throughput (because more partitions will be alive
> > on
> > > the same broker thereby dividing the network usage), but I have to test
> > it
> > > in staging environment
> > >
> > > Regards,
> > > Prabhjot
> > >
> > > On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira 
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> min.insync.replica is alive and well in 0.9 :)
> > >>
> > >> Normally, you will have 4 our of 4 replicas in sync. However if one of
> > the
> > >> replicas will fall behind, you will have 3 out of 4 in sync.
> > >> If you set min.insync.replica = 3, produce reque

Doubt on Kafka storage and Consuming messages

2015-11-28 Thread Goutam Chowdhury
Hi ,
I am new in Kafka . I went through some docments to understand Kafka. I
have some questions. Please help me to understand
1> Kafka storage:
What is the physical existing of segment file ?
What do you mean by "flushing segment file to disk"?
One of the document mentioned
-- "A message is only exposed to the consumers after it is flushed"
That means consumer has to wait ?

1> Consuming the message
>From where consumer read the messages? from segment file or from .log file
or from some cache?
2> If it is from cache then how and who is storing into such cache?



Thanks And Regards
Goutam Chowdhury


Maximum Topic Length in Kafka

2015-11-28 Thread Debraj Manna
Hi,

Can some one please let me know the following:-


   1. Is it possible to specify maximum length of a particular topic ( in
   terms of number of messages ) in kafka ?
   2. Also how does Kafka behave when a particular topic gets full?
   3. Can the producer be blocked if a topic get full rather than deleting
   old messages?

I have gone through the documentation
 but
could not find anything of what I am looking for.


Re: Increasing replication factor reliable?

2015-11-28 Thread Li Tao
Hi, i am wondering why the increased replication factor is related to a
partition instead of the topic. Isn't it very hard to manage? Any one can
help me clarify this?
On Nov 27, 2015 6:51 AM, "Dillian Murphey"  wrote:

> Alright, thank you all. Appreciate it.
>
> Cheers
>
> On Wed, Nov 25, 2015 at 10:50 PM, Gaurav Agarwal 
> wrote:
>
> > So u have two nodes running where you want to increase the replication
> > factor 2 because of fault tolerance. That won't be a problem
> > On Nov 25, 2015 6:26 AM, "Dillian Murphey" 
> > wrote:
> >
> > > Is it safe to run this on an active production topic?  A topic was
> > created
> > > without a replication factor of 2 and I want to increase it from 1 to 2
> > to
> > > have fault tolerance.
> > >
> > >
> > >
> >
> http://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> > >
> >
>


SV: What is the benefit of using acks=all and minover e.g. acks=3

2015-11-28 Thread Andreas Flinck
Hi!

Here are our settings for the properties requested:

num.network.threads=3
socket.request.max.bytes=104857600
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576

The following properties we don't set at all, so I guess they will default 
according to the documentation (within parenthesis):

"num.replica.fetchers": (1)
"replica.fetch.wait.max.ms": (500),
"num.recovery.threads.per.data.dir": (1)

The producer properties we explicitly set are the following;

block.on.buffer.full=false
client.id=MZ
max.request.size=1048576
acks=all
retries=0
timeout.ms=3
buffer.memory=67108864
metadata.fetch.timeout.ms=3000

Do let me know what you think about it! We are currently setting up some tests 
using the broker properties that you suggested.

Regards
Andreas







Från: Prabhjot Bharaj 
Skickat: den 28 november 2015 11:37
Till: users@kafka.apache.org
Ämne: Re: What is the benefit of using acks=all and minover e.g. acks=3

Hi,

Clogging can happen if, as seems in your case, the requests are bounded by
network.
Just to confirm your configurations, does your broker configuration look
like this?? :-

"num.replica.fetchers": 4,
"replica.fetch.wait.max.ms": 500,
"num.recovery.threads.per.data.dir": 4,


"num.network.threads": 8,
"socket.request.max.bytes": 104857600,
"socket.receive.buffer.bytes": 10485760,
"socket.send.buffer.bytes": 10485760,

Similarly, please share your producer config as well. I'm thinking may be
it is related to tuning your cluster.

Thanks,
Prabhjot


On Sat, Nov 28, 2015 at 3:54 PM, Andreas Flinck <
andreas.fli...@digitalroute.com> wrote:

> Great, thanks for the information! So it is definitely acks=all we want to
> go for. Unfortunately we run into an blocking issue in our production like
> test environment which we have not been able to find a solution for. So
> here it is, ANY idea on how we could possibly find a solution is very much
> appreciated!
>
> Environment:
> Kafka version: kafka_2.11-0.8.2.1
> 5 kafka brokers and 5 ZK on spread out on 5 hosts
> Using new producer (async)
>
> Topic:
> partitions=10
> replication-factor=4
> min.insync.replicas=2
>
> Default property values used for broker configs and producer.
>
> Scenario and problem:
> Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers which
> is working great until we start another 5 producers sending to another 5
> topics with the same rate (10k). What happens then is that the producers
> sending to 2 of the topics fills up the buffer and the throughput becomes
> very low, with BufferExhaustedExceptions for most of the messages. When
> checking the latency for the problematic topics it becomes really high
> (around 150ms). Stopping the 5 producers that were started in the second
> round, the latency goes down to about 1 ms again and the buffer will go
> back to normal. The load is not that high, about 10MB/s, it is not even
> near disk bound.
> So the questions right now are, why do we get such high latency to
> specifically two topics when starting more producers, even though cpu and
> disk load looks unproblematic? And why two topics specifically, is there an
> order of what topics to prfioritize when things get clogged for some reason?
>
> Sorry for the quite messy description, we are all kind of new at kafka
> here!
>
> BR
> Andreas
>
> > On 28 Nov 2015, at 09:26, Prabhjot Bharaj  wrote:
> >
> > Hi,
> >
> > This should help :)
> >
> > During my benchmarks, I noticed that if 5 node kafka cluster running 1
> > topic is given a continuous injection of 50GB in one shot (using a
> modified
> > producer performance script, which writes my custom data to kafka), the
> > last replica can sometimes lag and it used to catch up at a speed of 1GB
> in
> > 20-25 seconds. This lag increases if producer performance injects 200GB
> in
> > one shot.
> >
> > I'm not sure how it will behave with multiple topics.  it could have an
> > impact on the overall throughput (because more partitions will be alive
> on
> > the same broker thereby dividing the network usage), but I have to test
> it
> > in staging environment
> >
> > Regards,
> > Prabhjot
> >
> > On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira 
> wrote:
> >
> >> Hi,
> >>
> >> min.insync.replica is alive and well in 0.9 :)
> >>
> >> Normally, you will have 4 our of 4 replicas in sync. However if one of
> the
> >> replicas will fall behind, you will have 3 out of 4 in sync.
> >> If you set min.insync.replica = 3, produce requests will fail if the
> number
> >> on in-sync replicas fall below 3.
> >>
> >> I hope this helps.
> >>
> >> Gwen
> >>
> >> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj  >
> >> wrote:
> >>
> >>> Hi Gwen,
> >>>
> >>> How about min.isr.replicas property?
> >>> Is it still valid in the new version 0.9 ?
> >>>
> >>> We could get 3 out of 4 replicas in sync if we set it's value to 3.
> >>> Correct?
> >>>
> >>> Thanks,
> >>> Prabhjot
> >>> On Nov 28, 2015 10:20 AM, "Gwen Shapira"  wrote:
> >>

Re: What is the benefit of using acks=all and minover e.g. acks=3

2015-11-28 Thread Prabhjot Bharaj
Hi,

Clogging can happen if, as seems in your case, the requests are bounded by
network.
Just to confirm your configurations, does your broker configuration look
like this?? :-

"num.replica.fetchers": 4,
"replica.fetch.wait.max.ms": 500,
"num.recovery.threads.per.data.dir": 4,


"num.network.threads": 8,
"socket.request.max.bytes": 104857600,
"socket.receive.buffer.bytes": 10485760,
"socket.send.buffer.bytes": 10485760,

Similarly, please share your producer config as well. I'm thinking may be
it is related to tuning your cluster.

Thanks,
Prabhjot


On Sat, Nov 28, 2015 at 3:54 PM, Andreas Flinck <
andreas.fli...@digitalroute.com> wrote:

> Great, thanks for the information! So it is definitely acks=all we want to
> go for. Unfortunately we run into an blocking issue in our production like
> test environment which we have not been able to find a solution for. So
> here it is, ANY idea on how we could possibly find a solution is very much
> appreciated!
>
> Environment:
> Kafka version: kafka_2.11-0.8.2.1
> 5 kafka brokers and 5 ZK on spread out on 5 hosts
> Using new producer (async)
>
> Topic:
> partitions=10
> replication-factor=4
> min.insync.replicas=2
>
> Default property values used for broker configs and producer.
>
> Scenario and problem:
> Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers which
> is working great until we start another 5 producers sending to another 5
> topics with the same rate (10k). What happens then is that the producers
> sending to 2 of the topics fills up the buffer and the throughput becomes
> very low, with BufferExhaustedExceptions for most of the messages. When
> checking the latency for the problematic topics it becomes really high
> (around 150ms). Stopping the 5 producers that were started in the second
> round, the latency goes down to about 1 ms again and the buffer will go
> back to normal. The load is not that high, about 10MB/s, it is not even
> near disk bound.
> So the questions right now are, why do we get such high latency to
> specifically two topics when starting more producers, even though cpu and
> disk load looks unproblematic? And why two topics specifically, is there an
> order of what topics to prfioritize when things get clogged for some reason?
>
> Sorry for the quite messy description, we are all kind of new at kafka
> here!
>
> BR
> Andreas
>
> > On 28 Nov 2015, at 09:26, Prabhjot Bharaj  wrote:
> >
> > Hi,
> >
> > This should help :)
> >
> > During my benchmarks, I noticed that if 5 node kafka cluster running 1
> > topic is given a continuous injection of 50GB in one shot (using a
> modified
> > producer performance script, which writes my custom data to kafka), the
> > last replica can sometimes lag and it used to catch up at a speed of 1GB
> in
> > 20-25 seconds. This lag increases if producer performance injects 200GB
> in
> > one shot.
> >
> > I'm not sure how it will behave with multiple topics.  it could have an
> > impact on the overall throughput (because more partitions will be alive
> on
> > the same broker thereby dividing the network usage), but I have to test
> it
> > in staging environment
> >
> > Regards,
> > Prabhjot
> >
> > On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira 
> wrote:
> >
> >> Hi,
> >>
> >> min.insync.replica is alive and well in 0.9 :)
> >>
> >> Normally, you will have 4 our of 4 replicas in sync. However if one of
> the
> >> replicas will fall behind, you will have 3 out of 4 in sync.
> >> If you set min.insync.replica = 3, produce requests will fail if the
> number
> >> on in-sync replicas fall below 3.
> >>
> >> I hope this helps.
> >>
> >> Gwen
> >>
> >> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj  >
> >> wrote:
> >>
> >>> Hi Gwen,
> >>>
> >>> How about min.isr.replicas property?
> >>> Is it still valid in the new version 0.9 ?
> >>>
> >>> We could get 3 out of 4 replicas in sync if we set it's value to 3.
> >>> Correct?
> >>>
> >>> Thanks,
> >>> Prabhjot
> >>> On Nov 28, 2015 10:20 AM, "Gwen Shapira"  wrote:
> >>>
>  In your scenario, you are receiving acks from 3 replicas while it is
>  possible to have 4 in the ISR. This means that one replica can be up
> to
>  4000 messages (by default) behind others. If a leader crashes, there
> is
> >>> 33%
>  chance this replica will become the new leader, thereby losing up to
> >> 4000
>  messages.
> 
>  acks = all requires all ISR to ack as long as they are in the ISR,
>  protecting you from this scenario (but leading to high latency if a
> >>> replica
>  is hanging and is just about to drop out of the ISR).
> 
>  Also, note that in future versions acks > 1 was deprecated, to protect
>  against such subtle mistakes.
> 
>  Gwen
> 
>  On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
>  andreas.fli...@digitalroute.com> wrote:
> 
> > Hi all
> >
> > The reason why I need to know is that we have seen an issue when
> >> using
> > acks=all, forcing us to quickly f

Re: What is the benefit of using acks=all and minover e.g. acks=3

2015-11-28 Thread Andreas Flinck
Great, thanks for the information! So it is definitely acks=all we want to go 
for. Unfortunately we run into an blocking issue in our production like test 
environment which we have not been able to find a solution for. So here it is, 
ANY idea on how we could possibly find a solution is very much appreciated!

Environment:
Kafka version: kafka_2.11-0.8.2.1
5 kafka brokers and 5 ZK on spread out on 5 hosts
Using new producer (async)

Topic:
partitions=10
replication-factor=4
min.insync.replicas=2

Default property values used for broker configs and producer.

Scenario and problem:
Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers which is 
working great until we start another 5 producers sending to another 5 topics 
with the same rate (10k). What happens then is that the producers sending to 2 
of the topics fills up the buffer and the throughput becomes very low, with 
BufferExhaustedExceptions for most of the messages. When checking the latency 
for the problematic topics it becomes really high (around 150ms). Stopping the 
5 producers that were started in the second round, the latency goes down to 
about 1 ms again and the buffer will go back to normal. The load is not that 
high, about 10MB/s, it is not even near disk bound. 
So the questions right now are, why do we get such high latency to specifically 
two topics when starting more producers, even though cpu and disk load looks 
unproblematic? And why two topics specifically, is there an order of what 
topics to prfioritize when things get clogged for some reason? 

Sorry for the quite messy description, we are all kind of new at kafka here!

BR
Andreas

> On 28 Nov 2015, at 09:26, Prabhjot Bharaj  wrote:
> 
> Hi,
> 
> This should help :)
> 
> During my benchmarks, I noticed that if 5 node kafka cluster running 1
> topic is given a continuous injection of 50GB in one shot (using a modified
> producer performance script, which writes my custom data to kafka), the
> last replica can sometimes lag and it used to catch up at a speed of 1GB in
> 20-25 seconds. This lag increases if producer performance injects 200GB in
> one shot.
> 
> I'm not sure how it will behave with multiple topics.  it could have an
> impact on the overall throughput (because more partitions will be alive on
> the same broker thereby dividing the network usage), but I have to test it
> in staging environment
> 
> Regards,
> Prabhjot
> 
> On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira  wrote:
> 
>> Hi,
>> 
>> min.insync.replica is alive and well in 0.9 :)
>> 
>> Normally, you will have 4 our of 4 replicas in sync. However if one of the
>> replicas will fall behind, you will have 3 out of 4 in sync.
>> If you set min.insync.replica = 3, produce requests will fail if the number
>> on in-sync replicas fall below 3.
>> 
>> I hope this helps.
>> 
>> Gwen
>> 
>> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj 
>> wrote:
>> 
>>> Hi Gwen,
>>> 
>>> How about min.isr.replicas property?
>>> Is it still valid in the new version 0.9 ?
>>> 
>>> We could get 3 out of 4 replicas in sync if we set it's value to 3.
>>> Correct?
>>> 
>>> Thanks,
>>> Prabhjot
>>> On Nov 28, 2015 10:20 AM, "Gwen Shapira"  wrote:
>>> 
 In your scenario, you are receiving acks from 3 replicas while it is
 possible to have 4 in the ISR. This means that one replica can be up to
 4000 messages (by default) behind others. If a leader crashes, there is
>>> 33%
 chance this replica will become the new leader, thereby losing up to
>> 4000
 messages.
 
 acks = all requires all ISR to ack as long as they are in the ISR,
 protecting you from this scenario (but leading to high latency if a
>>> replica
 is hanging and is just about to drop out of the ISR).
 
 Also, note that in future versions acks > 1 was deprecated, to protect
 against such subtle mistakes.
 
 Gwen
 
 On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
 andreas.fli...@digitalroute.com> wrote:
 
> Hi all
> 
> The reason why I need to know is that we have seen an issue when
>> using
> acks=all, forcing us to quickly find an alternative. I leave the
>> issue
 out
> of this post, but will probably come back to that!
> 
> My question is about acks=all and min.insync.replicas property. Since
>>> we
> have found a workaround for an issue by using acks>1 instead of all
> (absolutely no clue why at this moment), I would like to know what
 benefit
> you get from e.g. acks=all and min.insync.replicas=3 instead of using
> acks=3 in a 5 broker cluster and replication-factor of 4. To my
> understanding you would get the exact level of durability and
>> security
 from
> using either of those settings. However, I suspect this is not quite
>>> the
> case from finding hints without proper explanation that acks=all is
> preferred.
> 
> 
> Regards
> Andreas
 
>>> 
>> 
> 
> 
> 
> -- 
> --

500 ms delay using new consumer and schema registry.

2015-11-28 Thread Gerard Klijs
Hi all,
I'm running all little test, with both zookeeper, Kafka and the schema
registry running locally. Using the new consumer, and the 2.0.0-snapshot
version of the registry, which has an decoder giving back instances of the
schema object.

It's all working fine, but I see a consistent delay maximum around 500 ms.
I'm just wondering if anyone knows what might be the cause. The delay is
from creating the record, to receiving the object.

For who wants to try the same thing, I ran into some problems until I
created the java from a schema using avro, instead of using avro to
generate schema from a class. Thus could just have been caused by the
default constructor not being available in the java class I used.


Re: What is the benefit of using acks=all and minover e.g. acks=3

2015-11-28 Thread Prabhjot Bharaj
Hi,

This should help :)

During my benchmarks, I noticed that if 5 node kafka cluster running 1
topic is given a continuous injection of 50GB in one shot (using a modified
producer performance script, which writes my custom data to kafka), the
last replica can sometimes lag and it used to catch up at a speed of 1GB in
20-25 seconds. This lag increases if producer performance injects 200GB in
one shot.

I'm not sure how it will behave with multiple topics.  it could have an
impact on the overall throughput (because more partitions will be alive on
the same broker thereby dividing the network usage), but I have to test it
in staging environment

Regards,
Prabhjot

On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira  wrote:

> Hi,
>
> min.insync.replica is alive and well in 0.9 :)
>
> Normally, you will have 4 our of 4 replicas in sync. However if one of the
> replicas will fall behind, you will have 3 out of 4 in sync.
> If you set min.insync.replica = 3, produce requests will fail if the number
> on in-sync replicas fall below 3.
>
> I hope this helps.
>
> Gwen
>
> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj 
> wrote:
>
> > Hi Gwen,
> >
> > How about min.isr.replicas property?
> > Is it still valid in the new version 0.9 ?
> >
> > We could get 3 out of 4 replicas in sync if we set it's value to 3.
> > Correct?
> >
> > Thanks,
> > Prabhjot
> > On Nov 28, 2015 10:20 AM, "Gwen Shapira"  wrote:
> >
> > > In your scenario, you are receiving acks from 3 replicas while it is
> > > possible to have 4 in the ISR. This means that one replica can be up to
> > > 4000 messages (by default) behind others. If a leader crashes, there is
> > 33%
> > > chance this replica will become the new leader, thereby losing up to
> 4000
> > > messages.
> > >
> > > acks = all requires all ISR to ack as long as they are in the ISR,
> > > protecting you from this scenario (but leading to high latency if a
> > replica
> > > is hanging and is just about to drop out of the ISR).
> > >
> > > Also, note that in future versions acks > 1 was deprecated, to protect
> > > against such subtle mistakes.
> > >
> > > Gwen
> > >
> > > On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
> > > andreas.fli...@digitalroute.com> wrote:
> > >
> > > > Hi all
> > > >
> > > > The reason why I need to know is that we have seen an issue when
> using
> > > > acks=all, forcing us to quickly find an alternative. I leave the
> issue
> > > out
> > > > of this post, but will probably come back to that!
> > > >
> > > > My question is about acks=all and min.insync.replicas property. Since
> > we
> > > > have found a workaround for an issue by using acks>1 instead of all
> > > > (absolutely no clue why at this moment), I would like to know what
> > > benefit
> > > > you get from e.g. acks=all and min.insync.replicas=3 instead of using
> > > > acks=3 in a 5 broker cluster and replication-factor of 4. To my
> > > > understanding you would get the exact level of durability and
> security
> > > from
> > > > using either of those settings. However, I suspect this is not quite
> > the
> > > > case from finding hints without proper explanation that acks=all is
> > > > preferred.
> > > >
> > > >
> > > > Regards
> > > > Andreas
> > >
> >
>



-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"