latency test

2015-09-03 Thread Yuheng Du
I am running a producer latency test. When using 92 producers in 92
physical node publishing to 4 brokers, the latency is slightly lower than
using 8 brokers, I am using 8 partitions for the topic.

I have rerun the test and it gives me the same result, the 4 brokers
scenario still has lower latency than the 8 brokers scenarios.

It is weird because I tested 1broker, 2 brokers, 4 brokers, 8 brokers, 16
brokers and 32 brokers. For the rest of the case the latency decreases as
the number of brokers increase.

4 brokers/8 brokers is the only pair that doesn't satisfy this rule. What
could be the cause?

I am using a 200 bytes message, the test let each producer publishes 500k
messages to a given topic. Every test run when I change the number of
brokers, I use a new topic.

Thanks for any advices.


Huge Kafka Message size ( 386073344 ) in response

2015-09-03 Thread Qi Xu
Hi all,
I'm using the Kafka.Net library for implementing the Kafka Producer.
One issue I find out is that sometimes it reads the response from kafka
server, which indicates a huge message size 386073344. Apparently something
must be wrong.
But I'm not sure if it's a special flag that Kafka.net fails to handle or
it's a bug in Kafka Server side. Did you see this before?

Thanks,
Qi


Re: Slow ISR catch-up

2015-09-03 Thread Gwen Shapira
Yes, this should work. Expect lower throughput though.

On Thu, Sep 3, 2015 at 12:52 PM, Prabhjot Bharaj 
wrote:

> Hi,
>
> Can I use sync for acks = -1?
>
> Regards,
> Prabhjot
> On Sep 3, 2015 11:49 PM, "Gwen Shapira"  wrote:
>
> > The test uses the old producer (we should fix that), and since you don't
> > specify --sync, it runs async.
> > The old async producer simply sends data and doesn't wait for acks, so it
> > is possible that the messages were never acked...
> >
> > On Thu, Sep 3, 2015 at 7:56 AM, Prabhjot Bharaj 
> > wrote:
> >
> > > Hi Folks,
> > >
> > > Request your expertise on my doubt here.
> > >
> > > *My setup:-*
> > >
> > > 5 node kafka cluster (4 cores, 8GB RAM) on RAID-6 (500 GB)
> > > Using Kafka 0.8.2.1 with modified ProducerPerformance.scala
> > > I've modified ProducerPerformance.scala to send custom ASCII data,
> > instead
> > > of Byte Array of Zeroes
> > >
> > > *server.properties:-*
> > >
> > > broker.id=0
> > >
> > > log.cleaner.enable=false
> > >
> > > log.dirs=/tmp/kafka-logs
> > >
> > > log.retention.check.interval.ms=30
> > >
> > > log.retention.hours=168
> > >
> > > log.segment.bytes=1073741824
> > >
> > > num.io.threads=8
> > >
> > > num.network.threads=3
> > >
> > > num.partitions=1
> > >
> > > num.recovery.threads.per.data.dir=1
> > >
> > > *num.replica.fetchers=4*
> > >
> > > port=9092
> > >
> > > socket.receive.buffer.bytes=1048576
> > >
> > > socket.request.max.bytes=104857600
> > >
> > > socket.send.buffer.bytes=1048576
> > >
> > > zookeeper.connect=localhost:2181
> > >
> > > zookeeper.connection.timeout.ms=6000
> > >
> > >
> > > *This is how I run the producer perf test:-*
> > >
> > > kafka-producer-perf-test.sh --broker-list
> > > a.a.a.a:9092,b.b.b.b:9092,c.c.c.c:9092,d.d.d.d:9092,e.e.e.e:9092
> > --messages
> > > 10 --message-size 500 --topics temp --show-detailed-stats
> --threads
> > 5
> > > --request-num-acks -1 --batch-size 200 --request-timeout-ms 1
> > > --compression-codec 0
> > >
> > > *Problem:-*
> > >
> > > This test completes in under 15 seconds for me
> > >
> > > But, after this test, if I try writing to another topic which has 2
> > > partitions and 3 replicas, it is dead slow and the same script seems
> > never
> > > to finish because the slow ISR catch-up is still going on.
> > >
> > > *My inference:-*
> > > I have noticed that for a topic with 1 partition and 3 replicas, the
> ISR
> > > shows only 1 broker id.
> > >
> > > Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:
> > >
> > > Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5
> > >
> > >
> > > I think it is because the data from the leader is not received in
> broker
> > > ids 1 and 2
> > > Also, I could confirm it from the data directory sizes for this topic.
> > > Leader (5) has 20GB but replicas - 1 and 2 are still at 7GB
> > >
> > > *Doubts:-*
> > > 1. But, I was running the kafka-producer-perf-test.sh with acks=-1,
> which
> > > means that all data must have been committed to all replicas. But, with
> > the
> > > replicas still at 7GB, it doesnt seem that acks=-1 is considered by the
> > > producer.
> > >
> > > Am I missing something ?
> > >
> > > Regards,
> > > Prabhjot
> > >
> >
>


Re: Slow ISR catch-up

2015-09-03 Thread Prabhjot Bharaj
Hi,

Can I use sync for acks = -1?

Regards,
Prabhjot
On Sep 3, 2015 11:49 PM, "Gwen Shapira"  wrote:

> The test uses the old producer (we should fix that), and since you don't
> specify --sync, it runs async.
> The old async producer simply sends data and doesn't wait for acks, so it
> is possible that the messages were never acked...
>
> On Thu, Sep 3, 2015 at 7:56 AM, Prabhjot Bharaj 
> wrote:
>
> > Hi Folks,
> >
> > Request your expertise on my doubt here.
> >
> > *My setup:-*
> >
> > 5 node kafka cluster (4 cores, 8GB RAM) on RAID-6 (500 GB)
> > Using Kafka 0.8.2.1 with modified ProducerPerformance.scala
> > I've modified ProducerPerformance.scala to send custom ASCII data,
> instead
> > of Byte Array of Zeroes
> >
> > *server.properties:-*
> >
> > broker.id=0
> >
> > log.cleaner.enable=false
> >
> > log.dirs=/tmp/kafka-logs
> >
> > log.retention.check.interval.ms=30
> >
> > log.retention.hours=168
> >
> > log.segment.bytes=1073741824
> >
> > num.io.threads=8
> >
> > num.network.threads=3
> >
> > num.partitions=1
> >
> > num.recovery.threads.per.data.dir=1
> >
> > *num.replica.fetchers=4*
> >
> > port=9092
> >
> > socket.receive.buffer.bytes=1048576
> >
> > socket.request.max.bytes=104857600
> >
> > socket.send.buffer.bytes=1048576
> >
> > zookeeper.connect=localhost:2181
> >
> > zookeeper.connection.timeout.ms=6000
> >
> >
> > *This is how I run the producer perf test:-*
> >
> > kafka-producer-perf-test.sh --broker-list
> > a.a.a.a:9092,b.b.b.b:9092,c.c.c.c:9092,d.d.d.d:9092,e.e.e.e:9092
> --messages
> > 10 --message-size 500 --topics temp --show-detailed-stats  --threads
> 5
> > --request-num-acks -1 --batch-size 200 --request-timeout-ms 1
> > --compression-codec 0
> >
> > *Problem:-*
> >
> > This test completes in under 15 seconds for me
> >
> > But, after this test, if I try writing to another topic which has 2
> > partitions and 3 replicas, it is dead slow and the same script seems
> never
> > to finish because the slow ISR catch-up is still going on.
> >
> > *My inference:-*
> > I have noticed that for a topic with 1 partition and 3 replicas, the ISR
> > shows only 1 broker id.
> >
> > Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:
> >
> > Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5
> >
> >
> > I think it is because the data from the leader is not received in broker
> > ids 1 and 2
> > Also, I could confirm it from the data directory sizes for this topic.
> > Leader (5) has 20GB but replicas - 1 and 2 are still at 7GB
> >
> > *Doubts:-*
> > 1. But, I was running the kafka-producer-perf-test.sh with acks=-1, which
> > means that all data must have been committed to all replicas. But, with
> the
> > replicas still at 7GB, it doesnt seem that acks=-1 is considered by the
> > producer.
> >
> > Am I missing something ?
> >
> > Regards,
> > Prabhjot
> >
>


Re: API to query cluster metadata on-demand

2015-09-03 Thread Todd Palino
What Gwen said :)

We developed a python web service internally called Menagerie that provides
this functionality for both Kafka and Zookeeper. We use it to drive a web
dashboard for stats, our (old style) lag checking, and some other CLI
tools. Unfortunately it ties into too much internal LinkedIn tooling for us
to open source.

That's one of the reasons we released Burrow (
https://github.com/linkedin/Burrow). The primary use is to do lag checking
for consumers as a service. But I'm also moving functionality from
Menagerie into it. Right now you can use it to fetch topic lists, partition
counts, and broker offsets. You can also get information for consumers (as
long as they are committing offsets to Kafka and not ZK).

If it looks useful and there's some bit of info you'd like it to provide,
you can submit a github issue and I'll take a look at it.

-Todd

On Thursday, September 3, 2015, Andrew Otto  wrote:

> If you don’t mind doing it with a C CLI:
>
> https://github.com/edenhill/kafkacat
>
> $ kafkacat -L -b mybroker
>
> But, uhhh, you probably want a something in the Java API.
>
> :)
>
>
> > On Sep 3, 2015, at 13:58, Gwen Shapira >
> wrote:
> >
> > Ah, I wish.
> >
> > We are working on it :)
> >
> > On Thu, Sep 3, 2015 at 9:10 AM, Simon Cooper <
> > simon.coo...@featurespace.co.uk > wrote:
> >
> >> Is there a basic interface in the new client APIs to get the list of
> >> topics on a cluster, and get information on the topics (offsets, sizes,
> >> etc), without having to deal with a producer or consumer? I just want a
> >> basic synchronous API to query the metadata as-is. Does this exist in
> some
> >> form?
> >>
> >> Thanks,
> >> Simon
> >>
>
>


Re: API to query cluster metadata on-demand

2015-09-03 Thread Andrew Otto
If you don’t mind doing it with a C CLI:

https://github.com/edenhill/kafkacat

$ kafkacat -L -b mybroker

But, uhhh, you probably want a something in the Java API.

:)


> On Sep 3, 2015, at 13:58, Gwen Shapira  wrote:
> 
> Ah, I wish.
> 
> We are working on it :)
> 
> On Thu, Sep 3, 2015 at 9:10 AM, Simon Cooper <
> simon.coo...@featurespace.co.uk> wrote:
> 
>> Is there a basic interface in the new client APIs to get the list of
>> topics on a cluster, and get information on the topics (offsets, sizes,
>> etc), without having to deal with a producer or consumer? I just want a
>> basic synchronous API to query the metadata as-is. Does this exist in some
>> form?
>> 
>> Thanks,
>> Simon
>> 



Re: Slow ISR catch-up

2015-09-03 Thread Gwen Shapira
The test uses the old producer (we should fix that), and since you don't
specify --sync, it runs async.
The old async producer simply sends data and doesn't wait for acks, so it
is possible that the messages were never acked...

On Thu, Sep 3, 2015 at 7:56 AM, Prabhjot Bharaj 
wrote:

> Hi Folks,
>
> Request your expertise on my doubt here.
>
> *My setup:-*
>
> 5 node kafka cluster (4 cores, 8GB RAM) on RAID-6 (500 GB)
> Using Kafka 0.8.2.1 with modified ProducerPerformance.scala
> I've modified ProducerPerformance.scala to send custom ASCII data, instead
> of Byte Array of Zeroes
>
> *server.properties:-*
>
> broker.id=0
>
> log.cleaner.enable=false
>
> log.dirs=/tmp/kafka-logs
>
> log.retention.check.interval.ms=30
>
> log.retention.hours=168
>
> log.segment.bytes=1073741824
>
> num.io.threads=8
>
> num.network.threads=3
>
> num.partitions=1
>
> num.recovery.threads.per.data.dir=1
>
> *num.replica.fetchers=4*
>
> port=9092
>
> socket.receive.buffer.bytes=1048576
>
> socket.request.max.bytes=104857600
>
> socket.send.buffer.bytes=1048576
>
> zookeeper.connect=localhost:2181
>
> zookeeper.connection.timeout.ms=6000
>
>
> *This is how I run the producer perf test:-*
>
> kafka-producer-perf-test.sh --broker-list
> a.a.a.a:9092,b.b.b.b:9092,c.c.c.c:9092,d.d.d.d:9092,e.e.e.e:9092 --messages
> 10 --message-size 500 --topics temp --show-detailed-stats  --threads 5
> --request-num-acks -1 --batch-size 200 --request-timeout-ms 1
> --compression-codec 0
>
> *Problem:-*
>
> This test completes in under 15 seconds for me
>
> But, after this test, if I try writing to another topic which has 2
> partitions and 3 replicas, it is dead slow and the same script seems never
> to finish because the slow ISR catch-up is still going on.
>
> *My inference:-*
> I have noticed that for a topic with 1 partition and 3 replicas, the ISR
> shows only 1 broker id.
>
> Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:
>
> Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5
>
>
> I think it is because the data from the leader is not received in broker
> ids 1 and 2
> Also, I could confirm it from the data directory sizes for this topic.
> Leader (5) has 20GB but replicas - 1 and 2 are still at 7GB
>
> *Doubts:-*
> 1. But, I was running the kafka-producer-perf-test.sh with acks=-1, which
> means that all data must have been committed to all replicas. But, with the
> replicas still at 7GB, it doesnt seem that acks=-1 is considered by the
> producer.
>
> Am I missing something ?
>
> Regards,
> Prabhjot
>


Re: API to query cluster metadata on-demand

2015-09-03 Thread Gwen Shapira
Ah, I wish.

We are working on it :)

On Thu, Sep 3, 2015 at 9:10 AM, Simon Cooper <
simon.coo...@featurespace.co.uk> wrote:

> Is there a basic interface in the new client APIs to get the list of
> topics on a cluster, and get information on the topics (offsets, sizes,
> etc), without having to deal with a producer or consumer? I just want a
> basic synchronous API to query the metadata as-is. Does this exist in some
> form?
>
> Thanks,
> Simon
>


RE: Competing customers

2015-09-03 Thread Joris Peeters
Great, thanks - that does help. I'll kick off some partitions, then. :)

(I think I saw your video lectures on safaribooksonline! I should probably have 
paid better attention..)

Joris Peeters
Software Developer

Research and Data Technology
T: +44 (0) 20 8576 5800


-Original Message-
From: Gwen Shapira [mailto:g...@confluent.io]
Sent: 03 September 2015 17:58
To: users@kafka.apache.org
Subject: Re: Competing customers

Yeah, scaling through adding partitions ("sharding") is a basic feature of 
Kafka.
We expect topics to have many partitions (at least as many as number of 
consumers), and each consumer to get a subset of the messages by getting a 
subset of partitions.

This design gives Kafka its two biggest advantages:
1. Order guarantee - consumers are guaranteed to get messages in order because 
they are reading from a subset of partitions in order (rather than getting a 
mix of messages from different partitions) 2. Scalability - because we just 
need to track the last message each consumer read from each partition (and we 
know it consumed everything that came before, due to #1), we can scale to huge 
number of consumers and partitions without worrying about overhead of worrying 
about who got which message. This is the biggest different between Kafka and 
JMS queues.

Hope this helps.

Gwen

On Thu, Sep 3, 2015 at 9:49 AM, Joris Peeters 
wrote:

> I imagine this has been asked before, but I have googled around quite
> a bit and can’t really find a clear answer. Apologies in advance, though ..
>
>
>
> I’m interested in Kafka setups that allow for competing customers.
> I’ll have one topic where a lot of messages get published to, and I’d
> like to be able to (dynamically, eventually) fire up services to take
> messages of the queue and process them. Obviously, I’d expect each of
> the services to see consume only a subset of the messages.
>
>
>
> Do I understand correctly that I would need multiple partitions for this?
> I’ve been messing around a bit with a one topic/one partition setup,
> but all consumers receive the same (and total amount of) messages.
>
>
>
> Do all the clients support this? I’ve currently got the option between
> C#, Java and Python, more or less. (I expect the Java one to be most
> feature-complete).
>
>
>
> Thanks!
>
>
>
> *Joris Peeters*
>
> Developer
>
>
>
> *Research and Data Technology*
>
> T:
>
> +44 (0) 20 8576 5800
>
>
>
> *Winton*
>
> Grove House
> 27 Hammersmith Grove
> London W6 0NE
>
>
>
> wintoncapital.com 
>
>
>
> 
>
>
>
>
>
>
> Winton Capital Management Limited (“Winton”) is a limited company
> registered in England and Wales with its registered offices at 16 Old
> Bailey, London, EC4M 7EG (Registered Company No. 3311531). Winton is
> authorised and regulated by the Financial Conduct Authority in the
> United Kingdom, registered as an investment adviser with the US
> Securities and Exchange Commission, registered with the US Commodity
> Futures Trading Commission and a member of the National Futures
> Association in the United States.
>
> This communication, including any attachments, is confidential and may
> be privileged. This email is for use by the intended recipient only.
> If you receive it in error, please notify the sender and delete it.
> You should not copy or disclose all or any part of this email.
>
> This email does not constitute an offer or solicitation and nothing
> contained in this email constitutes, and should not be construed as,
> investment advice. Prospective investors should request offering
> materials and consult their own advisers with respect to investment
> decisions and inform themselves as to applicable legal requirements,
> exchange control regulations and taxes in the countries of their
> citizenship, residence or domicile. Past performance is not indicative of 
> future results.
>
> Winton takes reasonable steps to ensure the accuracy and integrity of
> its communications, including emails. However Winton accepts no
> liability for any materials transmitted. Emails are not secure and
> cannot be guaranteed to be error free.
>


Winton Capital Management Limited (“Winton”) is a limited company registered in 
England and Wales with its registered offices at 16 Old Bailey, London, EC4M 
7EG (Registered Company No. 3311531). Winton is authorised and regulated by the 
Financial Conduct Authority in the United Kingdom, registered as an investment 
adviser with the US Securities and Exchange Commission, registered with the US 
Commodity Futures Trading Commission and a member of the National Futures 
Association in the United States.

This communication, including any attachments, is confidential and may be 
privileged. This email is for use by the intended recipient only. If you 
receive it in error, please notify the sender and delete it. You should not 
copy or disclose all or any part of this email.

This email does not constitute an offer or solicitation and n

Re: Competing customers

2015-09-03 Thread Gwen Shapira
Yeah, scaling through adding partitions ("sharding") is a basic feature of
Kafka.
We expect topics to have many partitions (at least as many as number of
consumers), and each consumer to get a subset of the messages by getting a
subset of partitions.

This design gives Kafka its two biggest advantages:
1. Order guarantee - consumers are guaranteed to get messages in order
because they are reading from a subset of partitions in order (rather than
getting a mix of messages from different partitions)
2. Scalability - because we just need to track the last message each
consumer read from each partition (and we know it consumed everything that
came before, due to #1), we can scale to huge number of consumers and
partitions without worrying about overhead of worrying about who got which
message. This is the biggest different between Kafka and JMS queues.

Hope this helps.

Gwen

On Thu, Sep 3, 2015 at 9:49 AM, Joris Peeters 
wrote:

> I imagine this has been asked before, but I have googled around quite a
> bit and can’t really find a clear answer. Apologies in advance, though ..
>
>
>
> I’m interested in Kafka setups that allow for competing customers. I’ll
> have one topic where a lot of messages get published to, and I’d like to be
> able to (dynamically, eventually) fire up services to take messages of the
> queue and process them. Obviously, I’d expect each of the services to see
> consume only a subset of the messages.
>
>
>
> Do I understand correctly that I would need multiple partitions for this?
> I’ve been messing around a bit with a one topic/one partition setup, but
> all consumers receive the same (and total amount of) messages.
>
>
>
> Do all the clients support this? I’ve currently got the option between C#,
> Java and Python, more or less. (I expect the Java one to be most
> feature-complete).
>
>
>
> Thanks!
>
>
>
> *Joris Peeters*
>
> Developer
>
>
>
> *Research and Data Technology*
>
> T:
>
> +44 (0) 20 8576 5800
>
>
>
> *Winton*
>
> Grove House
> 27 Hammersmith Grove
> London W6 0NE
>
>
>
> wintoncapital.com 
>
>
>
> 
>
>
>
>
>
>
> Winton Capital Management Limited (“Winton”) is a limited company
> registered in England and Wales with its registered offices at 16 Old
> Bailey, London, EC4M 7EG (Registered Company No. 3311531). Winton is
> authorised and regulated by the Financial Conduct Authority in the United
> Kingdom, registered as an investment adviser with the US Securities and
> Exchange Commission, registered with the US Commodity Futures Trading
> Commission and a member of the National Futures Association in the United
> States.
>
> This communication, including any attachments, is confidential and may be
> privileged. This email is for use by the intended recipient only. If you
> receive it in error, please notify the sender and delete it. You should not
> copy or disclose all or any part of this email.
>
> This email does not constitute an offer or solicitation and nothing
> contained in this email constitutes, and should not be construed as,
> investment advice. Prospective investors should request offering materials
> and consult their own advisers with respect to investment decisions and
> inform themselves as to applicable legal requirements, exchange control
> regulations and taxes in the countries of their citizenship, residence or
> domicile. Past performance is not indicative of future results.
>
> Winton takes reasonable steps to ensure the accuracy and integrity of its
> communications, including emails. However Winton accepts no liability for
> any materials transmitted. Emails are not secure and cannot be guaranteed
> to be error free.
>


Competing customers

2015-09-03 Thread Joris Peeters
I imagine this has been asked before, but I have googled around quite a bit and 
can't really find a clear answer. Apologies in advance, though ..

I'm interested in Kafka setups that allow for competing customers. I'll have 
one topic where a lot of messages get published to, and I'd like to be able to 
(dynamically, eventually) fire up services to take messages of the queue and 
process them. Obviously, I'd expect each of the services to see consume only a 
subset of the messages.

Do I understand correctly that I would need multiple partitions for this? I've 
been messing around a bit with a one topic/one partition setup, but all 
consumers receive the same (and total amount of) messages.

Do all the clients support this? I've currently got the option between C#, Java 
and Python, more or less. (I expect the Java one to be most feature-complete).

Thanks!

Joris Peeters
Developer

Research and Data Technology
T:

+44 (0) 20 8576 5800


Winton
Grove House
27 Hammersmith Grove
London W6 0NE

wintoncapital.com

[cid:image002.jpg@01D0E670.EAC418F0]





Winton Capital Management Limited ("Winton") is a limited company registered in 
England and Wales with its registered offices at 16 Old Bailey, London, EC4M 
7EG (Registered Company No. 3311531). Winton is authorised and regulated by the 
Financial Conduct Authority in the United Kingdom, registered as an investment 
adviser with the US Securities and Exchange Commission, registered with the US 
Commodity Futures Trading Commission and a member of the National Futures 
Association in the United States.

This communication, including any attachments, is confidential and may be 
privileged. This email is for use by the intended recipient only. If you 
receive it in error, please notify the sender and delete it. You should not 
copy or disclose all or any part of this email.

This email does not constitute an offer or solicitation and nothing contained 
in this email constitutes, and should not be construed as, investment advice. 
Prospective investors should request offering materials and consult their own 
advisers with respect to investment decisions and inform themselves as to 
applicable legal requirements, exchange control regulations and taxes in the 
countries of their citizenship, residence or domicile. Past performance is not 
indicative of future results.

Winton takes reasonable steps to ensure the accuracy and integrity of its 
communications, including emails. However Winton accepts no liability for any 
materials transmitted. Emails are not secure and cannot be guaranteed to be 
error free.


[VOTE] 0.8.2.2 Candidate 1

2015-09-03 Thread Jun Rao
This is the first candidate for release of Apache Kafka 0.8.2.2. This only
fixes two critical issues (KAFKA-2189 and KAFKA-2308) related to snappy in
0.8.2.1.

Release Notes for the 0.8.2.2 release
https://people.apache.org/~junrao/kafka-0.8.2.2-candidate1/RELEASE_NOTES.html

*** Please download, test and vote by Tuesday, Sep 8, 7pm PT

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS in addition to the md5, sha1
and sha2 (SHA256) checksum.

* Release artifacts to be voted upon (source and binary):
https://people.apache.org/~junrao/kafka-0.8.2.2-candidate1/

* Maven artifacts to be voted upon prior to release:
https://repository.apache.org/content/groups/staging/

* scala-doc
https://people.apache.org/~junrao/kafka-0.8.2.2-candidate1/scaladoc/

* java-doc
https://people.apache.org/~junrao/kafka-0.8.2.2-candidate1/javadoc/

* The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.2 tag
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=d01226cfdcb3d9daad8465234750fa515a1e7e4a

/***

Thanks,

Jun


API to query cluster metadata on-demand

2015-09-03 Thread Simon Cooper
Is there a basic interface in the new client APIs to get the list of topics on 
a cluster, and get information on the topics (offsets, sizes, etc), without 
having to deal with a producer or consumer? I just want a basic synchronous API 
to query the metadata as-is. Does this exist in some form?

Thanks,
Simon


Re: How to monitor lag when "kafka" is used as offset.storage?

2015-09-03 Thread Todd Palino
You can use the emailer config in Burrow to send alerts directly (it will
monitor specific groups and send emails out when there is a problem). If
you need something more complex than that, I think the best practice is
always to send the output into an general alert/notification system.

-Todd

On Wednesday, September 2, 2015, shahab  wrote:

> Thanks Noah. I installed Burrow and played with it a little bit. It seems
> as you pointed out I need to implement the alerting system myself. Do you
> know any other Kafka tools that can give alerts?
>
> best,
> /Shahab
>
> On Wed, Sep 2, 2015 at 1:44 PM, noah >
> wrote:
>
> > We use Burrow . There are rest
> > endpoints you can use to get offsets and manually calculate lag, but if
> you
> > are focused on alerting, I'd use it's consumer statuses as they are a bit
> > smarter than a simple lag calculation.
> >
> > On Wed, Sep 2, 2015 at 4:08 AM shahab  > wrote:
> >
> > > Hi,
> > >
> > > I wonder how we can monitor lag (difference between consumer offset and
> > log
> > > ) when "kafka" is set as offset.storage?  because the
> "kafka-run-class.sh
> > > kafka.tools.ConsumerOffsetChecker ... " does work only when zookeeper
> is
> > > used as storage manager.
> > >
> > > best,
> > > /Shahab
> > >
> >
>


Slow ISR catch-up

2015-09-03 Thread Prabhjot Bharaj
Hi Folks,

Request your expertise on my doubt here.

*My setup:-*

5 node kafka cluster (4 cores, 8GB RAM) on RAID-6 (500 GB)
Using Kafka 0.8.2.1 with modified ProducerPerformance.scala
I've modified ProducerPerformance.scala to send custom ASCII data, instead
of Byte Array of Zeroes

*server.properties:-*

broker.id=0

log.cleaner.enable=false

log.dirs=/tmp/kafka-logs

log.retention.check.interval.ms=30

log.retention.hours=168

log.segment.bytes=1073741824

num.io.threads=8

num.network.threads=3

num.partitions=1

num.recovery.threads.per.data.dir=1

*num.replica.fetchers=4*

port=9092

socket.receive.buffer.bytes=1048576

socket.request.max.bytes=104857600

socket.send.buffer.bytes=1048576

zookeeper.connect=localhost:2181

zookeeper.connection.timeout.ms=6000


*This is how I run the producer perf test:-*

kafka-producer-perf-test.sh --broker-list
a.a.a.a:9092,b.b.b.b:9092,c.c.c.c:9092,d.d.d.d:9092,e.e.e.e:9092 --messages
10 --message-size 500 --topics temp --show-detailed-stats  --threads 5
--request-num-acks -1 --batch-size 200 --request-timeout-ms 1
--compression-codec 0

*Problem:-*

This test completes in under 15 seconds for me

But, after this test, if I try writing to another topic which has 2
partitions and 3 replicas, it is dead slow and the same script seems never
to finish because the slow ISR catch-up is still going on.

*My inference:-*
I have noticed that for a topic with 1 partition and 3 replicas, the ISR
shows only 1 broker id.

Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:

Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5


I think it is because the data from the leader is not received in broker
ids 1 and 2
Also, I could confirm it from the data directory sizes for this topic.
Leader (5) has 20GB but replicas - 1 and 2 are still at 7GB

*Doubts:-*
1. But, I was running the kafka-producer-perf-test.sh with acks=-1, which
means that all data must have been committed to all replicas. But, with the
replicas still at 7GB, it doesnt seem that acks=-1 is considered by the
producer.

Am I missing something ?

Regards,
Prabhjot


Is there any way to find out whether "kafka" is used as offset storage or "zookeeper"

2015-09-03 Thread shahab
Hi,

I have set offset.storage=kafka and dual.commit.enabled=false in the
consumer properties and restarted the brokers. I can send and receive
messages from Kafka.

I just want to make sure that "kafka" is used as offset storage not
"zookeeper". So Is there any way to see  whether "kafka" is used as offset
storage or "zookeeper" ?

best,
/Shahab